Oracle 11g r2: strange behavior on index - oracle11g

I have the query:
SELECT count(*)
FROM
(
SELECT
TBELENCO.DATA_PROC, TBELENCO.POD, TBELENCO.DESCRIZIONE, TBELENCO.ERROR, TBELENCO.STATO,
TBELENCO.SEZIONE, TBELENCO.NOME_FILE, TBELENCO.ID_CARICAMENTO, TBELENCO.ESITO_OPERAZIONE,
TBELENCO.DES_TIPO_MISURA,
--TBELENCO.RAGIONE_SOCIALE,
--ROW_NUMBER() OVER (ORDER BY TBELENCO.DATA_PROC DESC) R
ROWNUM R
FROM(
SELECT
LOG.DATA_PROC, LOG.POD, LOG.DESCRIZIONE, LOG.ERROR, LOG.STATO,
LOG.SEZIONE, LOG.NOME_FILE, LOG.ID_CARICAMENTO, LOG.ESITO_OPERAZIONE, TM.DES_TIPO_MISURA
--,C.RAGIONE_SOCIALE
--ROW_NUMBER() OVER (ORDER BY LOG.DATA_PROC DESC) R
FROM
MS042_LOADING_LOGS LOG JOIN MS116_MEASURE_TYPES TM ON
TM.ID_TIPO_MISURA=LOG.SEZIONE
-- LEFT JOIN(
-- SELECT CUST.RAGIONE_SOCIALE,STR.POD,RSC.DATA_DA, RSC.DATA_A
-- FROM
-- MS038_METERS STR JOIN MS036_REL_SITES_CUSTOMERS RSC ON
-- STR.ID_SITO=RSC.ID_SITO
-- JOIN MS030_CUSTOMERS CUST ON
-- CUST.ID_CLIENTE=RSC.ID_CLIENTE
-- ) C ON
-- C.POD=LOG.POD
--AND LOG.DATA_PROC BETWEEN C.DATA_DA AND C.DATA_A
WHERE
1=1
--AND LOG.DATA_PROC>=TRUNC(SYSDATE)
AND LOG.DATA_PROC>=TRUNC(SYSDATE)-3
--TO_DATE('01/11/2014', 'DD/MM/YYYY')
) TBELENCO
)
WHERE
R BETWEEN 1 AND 200;
If I execute the query with AND LOG.DATA_PROC>=TRUNC(SYSDATE)-3, Oracle uses the index on the data_proc field of the MS042_LOADING_LOGS (LOG) table, if I use, instead, AND LOG.DATA_PROC>=TRUNC(SYSDATE)-4 or -5, or -6, etc, it uses a table access full. Why this behavior?
I also execute a :
ALTER INDEX MS042_DATA_PROC_IDX REBUILD;
but with no changes.
Thank,
Igor
--***********************************************************
SELECT count(*)
FROM
(
SELECT
TBELENCO.DATA_PROC, TBELENCO.POD, TBELENCO.DESCRIZIONE, TBELENCO.ERROR, TBELENCO.STATO,
TBELENCO.SEZIONE, TBELENCO.NOME_FILE, TBELENCO.ID_CARICAMENTO, TBELENCO.ESITO_OPERAZIONE,
TBELENCO.DES_TIPO_MISURA,
ROWNUM R
FROM(
SELECT
LOG.DATA_PROC, LOG.POD, LOG.DESCRIZIONE, LOG.ERROR, LOG.STATO,
LOG.SEZIONE, LOG.NOME_FILE, LOG.ID_CARICAMENTO, LOG.ESITO_OPERAZIONE, TM.DES_TIPO_MISURA
FROM
MS042_LOADING_LOGS LOG JOIN MS116_MEASURE_TYPES TM ON
TM.ID_TIPO_MISURA=LOG.SEZIONE
WHERE
1=1
AND LOG.DATA_PROC>=TRUNC(SYSDATE)-1
) TBELENCO
)
WHERE
R BETWEEN 1 AND 200;
Plan hash value: 2191058229
-------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 30866 (2)| 00:06:11 |
| 1 | SORT AGGREGATE | | 1 | 13 | | |
|* 2 | VIEW | | 94236 | 1196K| 30866 (2)| 00:06:11 |
| 3 | COUNT | | | | | |
|* 4 | HASH JOIN | | 94236 | 1104K| 30866 (2)| 00:06:11 |
| 5 | INDEX FULL SCAN | P087_TIPI_MISURE_PK | 15 | 30 | 1 (0)| 00:00:01 |
| 6 | TABLE ACCESS BY INDEX ROWID| MS042_LOADING_LOGS | 94236 | 920K| 30864 (2)| 00:06:11 |
|* 7 | INDEX RANGE SCAN | MS042_DATA_PROC_IDX | 94236 | | 25742 (2)| 00:05:09 |
-------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("R"<=200 AND "R">=1)
4 - access("TM"."ID_TIPO_MISURA"="LOG"."SEZIONE")
7 - access(SYS_OP_DESCEND("DATA_PROC")<=SYS_OP_DESCEND(TRUNC(SYSDATE#!)-1))
filter(SYS_OP_UNDESCEND(SYS_OP_DESCEND("DATA_PROC"))>=TRUNC(SYSDATE#!)-1)
Plan hash value: 69930686
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 13 | 95921 (1)| 00:19:12 |
| 1 | SORT AGGREGATE | | 1 | 13 | | |
|* 2 | VIEW | | 1467K| 18M| 95921 (1)| 00:19:12 |
| 3 | COUNT | | | | | |
|* 4 | HASH JOIN | | 1467K| 16M| 95921 (1)| 00:19:12 |
| 5 | INDEX FULL SCAN | P087_TIPI_MISURE_PK | 15 | 30 | 1 (0)| 00:00:01 |
|* 6 | TABLE ACCESS FULL| MS042_LOADING_LOGS | 1467K| 13M| 95912 (1)| 00:19:11 |
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("R"<=200 AND "R">=1)
4 - access("TM"."ID_TIPO_MISURA"="LOG"."SEZIONE")
6 - filter("LOG"."DATA_PROC">=TRUNC(SYSDATE#!)-4)

The larger the fraction of rows that will be returned, the more efficient a table scan is and the less efficient it is to use an index. Apparently, Oracle expects that inflection point to come when the query returns more than 3 days of data. If that is inaccurate, I would expect that the statistics on your table or indexes are inaccurate.

Related

how to solve problem running code for MySQL8 on MySQL 5.7?

I have the following data:
+---------+--------+----------+------+-------+--------+-----------+
| xType | xAccID | xAccName | xCat | xYear | xMonth | xRaseed |
+---------+--------+----------+------+-------+--------+-----------+
| Amounts | 52 | Acc1 | Rs | 2020 | 11 | 3144.83 |
| Amounts | 52 | Acc1 | Rs | 2020 | 12 | -15199.64 |
| Amounts | 53 | Acc2 | Cus | 2020 | 12 | 5306.04 |
| Amounts | 53 | Acc2 | Cus | 2020 | 11 | 1090.64 |
+---------+--------+----------+------+-------+--------+-----------+
actually, I want to sum the (xRaseed) in the current row with the (xRaseed) in the previous row For each (xAccID) separately
the result that I want:
+---------+--------+----------+------+-------+--------+--------------------------------+
| xType | xAccID | xAccName | xCat | xYear | xMonth | xRaseed |
+---------+--------+----------+------+-------+--------+--------------------------------+
| Amounts | 52 | Acc1 | Rs | 2020 | 11 | 3144.83 |
| Amounts | 52 | Acc1 | Rs | 2020 | 12 | Not -15199.64 But (-12,054.81) |
| Amounts | 53 | Acc2 | Cus | 2020 | 12 | 5306.04 |
| Amounts | 53 | Acc2 | Cus | 2020 | 11 | Not 1090.64 But (6,396.68) |
+---------+--------+----------+------+-------+--------+--------------------------------+
I applied the following solution that I got from somebody here:
select t.*,
sum(xRaseed) over (partition by xAccID order by xYear, xMonth) as running_xRaseed
from t;
but everything was working in the local server but when I applied the solution on my hosting, didn't work?? in the local I use (xampp - 10.4.17-MariaDB), and in my hosting, I use (MySQL 5.7.23-23), what's the problem, please?
Here is a db<>fiddle
On versions of MySQL earlier than 8+, we can use a correlated subquery to find the rolling sum:
SELECT xType, xAccID, xAccName, xCat, xYear, xMonth,
(SELECT SUM(t2.xRaseed) FROM yourTable t2
WHERE t2.xAccID = t1.xAccID AND
(t2.xYear < t1.xYear OR
t2.xYear = t1.xYear AND t2.xMonth <= t1.xMonth)) AS xRaseed
FROM yourTable t1
ORDER BY
xAccId,
xYear,
xMonth;

How to merge records based on consective fields in Teradata

I have a source like below table:
+---------+--+--------+--+---------+--+--+------+
| ID | | SEQ_NO | | UNIT_ID | | | D_ID |
+---------+--+--------+--+---------+--+--+------+
| 7979092 | | 1 | | 99 | | | 759 |
| 7979092 | | 2 | | -1 | | | 869 |
| 7979092 | | 3 | | -1 | | | 927 |
| 7979092 | | 4 | | -1 | | | 812 |
| 7979092 | | 5 | | 99 | | | 900 |
| 7979092 | | 6 | | 99 | | | 891 |
| 7979092 | | 7 | | -1 | | | 785 |
| 7979092 | | 8 | | -1 | | | 762 |
| 7979092 | | 9 | | -1 | | | 923 |
+---------+--+--------+--+---------+--+--+------+
I have to merge the rows when consecutive unit_id has same value. We should take max(D_id) when we consolidate the rows. Expected output is:
+---------+---------+------+
| ID | UNIT_ID | D_ID |
+---------+---------+------+
| 7979092 | 99 | 759 |
| 7979092 | -1 | 927 |
| 7979092 | 99 | 900 |
| 7979092 | -1 | 923 |
+---------+---------+------+
I have tried to find the solution using Teradata ordered analytical function, but did not find the solution. I use Teradata 16.
Thank You.
This logic is a bit quirky, it's based on two sequences created by different sort orders:
SELECT
ID
,UNIT_ID
,Max(D_ID)
FROM
(
SELECT
ID
,SEQ_NO
,UNIT_ID
,D_ID
-- assign the same value to consecutive UNIT_IDs
,SEQ_NO -
Row_Number()
Over(PARTITION BY ID, UNIT_ID
ORDER BY SEQ_NO) AS grp
FROM tab
) AS dt
GROUP BY 1,2,grp
You can use RESET WHEN to dynamically create groups within the window. Here's one way to do it:
select ID, UNIT_ID,
max(D_ID) over(
partition by ID order by SEQ_NO
reset when UNIT_ID <> UNIT_ID_prev -- Create new group for new value
) as D_ID
from (
select ID, SEQ_NO, UNIT_ID, D_ID,
lag(UNIT_ID) over(partition by ID order by SEQ_NO) as UNIT_ID_prev -- Previous value
from MY_TABLE
) src
qualify row_number() over(
partition by ID order by SEQ_NO
reset when UNIT_ID <> UNIT_ID_prev -- Match original max() window
) = 1 -- One row per group (similar to DISTINCT)

Default value for LAG function in MariaDB

I'm trying to build a view which allows me to track the difference between paid values at two consecutive month_ids. When a figure is missing however, that would be because it's the first entry and therefore has a paid amount of 0. At present, I'm using the below to represent the previous figure since the [,default] argument has not been implemented in MariaDB.
CASE WHEN (
NOT(policy_agent_month.policy_agent_month_id IS NOT NULL
AND LAG(days_paid, 1) OVER (PARTITION BY claim_id ORDER BY month_id ) IS NULL)) THEN
LAG(days_paid, 1) OVER ( PARTITION BY claim_id ORDER BY month_id)
ELSE
0
END
The problem I have with this is that I have about 30 variables which this function needs to be applied over and it makes my code unreadable and very clunky. Is there a better solution?
Why use WITH?
SELECT province, tot_pop,
tot_pop - COALESCE(
(LAG(tot_pop) OVER (ORDER BY tot_pop ASC)),
0) AS delta
FROM provinces
ORDER BY tot_pop asc;
+---------------------------+----------+---------+
| province | tot_pop | delta |
+---------------------------+----------+---------+
| Nunavut | 14585 | 14585 |
| Yukon | 21304 | 6719 |
| Northwest Territories | 24571 | 3267 |
| Prince Edward Island | 63071 | 38500 |
| Newfoundland and Labrador | 100761 | 37690 |
| New Brunswick | 332715 | 231954 |
| Nova Scotia | 471284 | 138569 |
| Saskatchewan | 622467 | 151183 |
| Manitoba | 772672 | 150205 |
| Alberta | 2481213 | 1708541 |
| British Columbia | 3287519 | 806306 |
| Quebec | 5321098 | 2033579 |
| Ontario | 10071458 | 4750360 |
+---------------------------+----------+---------+
13 rows in set (0.00 sec)
However, it is not cheap (at least in MySQL 8.0);
the table has 13 rows, yet
FLUSH STATUS;
SELECT ...
SHOW SESSION STATUS LIKE 'Handler%';
MySQL 8.0:
+----------------------------+-------+
| Variable_name | Value |
+----------------------------+-------+
| Handler_read_rnd | 89 |
| Handler_read_rnd_next | 52 |
| Handler_write | 26 |
(and others)
MariaDB 10.3:
| Handler_read_rnd | 77 |
| Handler_read_rnd_next | 42 |
| Handler_tmp_write | 13 |
| Handler_update | 13 |
You can use a CTE (Common Table Expression) in MariaDB 10.2+ to pre-compute frequently used expressions and name them for later use:
with
x as ( -- first we compute the CTE that we name "x"
select
*,
coalesce(
LAG(days_paid, 1) OVER (PARTITION BY claim_id ORDER BY month_id),
123456
) as prev_month -- this expression gets the name "prev_month"
from my_table -- or a simple/complex join here
)
select -- now the main query
prev_month
from x
... -- rest of your query here where "prev_month" is computed.
In the main query prev_month has the lag value, or the default value 123456 when it's null.

Optimizing query that looks at a specific time window each day

This is a followup to my previous question
Optimizing query to get entire row where one field is the maximum for a group
I'll change the names from what I used there to make them a little more memorable, but these don't represent my actual use-case (so don't estimate the number of records from them).
I have a table with a schema like this:
OrderTime DATETIME(6),
Customer VARCHAR(50),
DrinkPrice DECIMAL,
Bartender VARCHAR(50),
TimeToPrepareDrink TIME(6),
...
I'd like to extract the rows from the table representing each customer's most expensive drink order during happy hour (3 PM - 6 PM) each day. So for instance I'd want results like
Date | Customer | OrderTime | MaxPrice | Bartender | ...
-------+----------+-------------+------------+-----------+-----
1/1/18 | Alice | 1/1/18 3:45 | 13.15 | Jane | ...
1/1/18 | Bob | 1/1/18 5:12 | 9.08 | Jane | ...
1/1/18 | Carol | 1/1/18 4:45 | 20.00 | Tarzan | ...
1/2/18 | Alice | 1/2/18 3:45 | 13.15 | Jane | ...
1/2/18 | Bob | 1/2/18 5:57 | 6.00 | Tarzan | ...
1/2/18 | Carol | 1/2/18 3:13 | 6.00 | Tarzan | ...
...
The table has an index on OrderTime, and contains tens of billions of records. (My customers are heavy drinkers).
Thanks to the previous question I'm able to extract this for a specific day pretty easily. I can do something like:
SELECT * FROM orders b
INNER JOIN (
SELECT Customer, MAX(DrinkPrice) as MaxPrice
FROM orders
WHERE OrderTime >= '2018-01-01 15:00'
AND OrderTime <= '2018-01-01 18:00'
GROUP BY Customer
) AS a
ON a.Customer = b.Customer
AND a.MaxPrice = b.DrinkPrice
WHERE b.OrderTime >= '2018-01-01 15:00'
AND b.OrderTime <= '2018-01-01 18:00';
This query runs in less than a second. The explain plan looks like this:
+---+-------------+------------+-------+---------------+------------+--------------------+--------------------------------------------------------+
| id| select_type | table | type | possible_keys | key | ref | Extra |
+---+-------------+------------+-------+---------------+------------+--------------------+--------------------------------------------------------+
| 1 | PRIMARY | b | range | OrderTime | OrderTime | NULL | Using index condition |
| 1 | PRIMARY | <derived2> | ref | key0 | key0 | b.Customer,b.Price | |
| 2 | DERIVED | orders | range | OrderTime | OrderTime | NULL | Using index condition; Using temporary; Using filesort |
+---+-------------+------------+-------+---------------+------------+--------------------+--------------------------------------------------------+
I can also get the information about the relevant rows for my query:
SELECT Date, Customer, MAX(DrinkPrice) AS MaxPrice
FROM
orders
INNER JOIN
(SELECT '2018-01-01' AS Date
UNION
SELECT '2018-01-02' AS Date) dates
WHERE OrderTime >= TIMESTAMP(Date, '15:00:00')
AND OrderTime <= TIMESTAMP(Date, '18:00:00')
GROUP BY Date, Customer
HAVING MaxPrice > 0;
This query also runs in less than a second. Here's how its explain plan looks:
+------+--------------+------------+------+---------------+------+------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | ref | Extra |
+------+--------------+------------+------+---------------+------+------+------------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | Using temporary; Using filesort |
| 1 | PRIMARY | orders | ALL | OrderTime | NULL | NULL | Range checked for each record (index map: 0x1) |
| 2 | DERIVED | NULL | NULL | NULL | NULL | NULL | No tables used |
| 3 | UNION | NULL | NULL | NULL | NULL | NULL | No tables used |
| NULL | UNION RESULT | <union2,3> | ALL | NULL | NULL | NULL | |
+------+--------------+------------+------+---------------+------+------+------------------------------------------------+
The problem now is retrieving the remaining fields from the table. I tried adapting the trick from before, like so:
SELECT * FROM
orders a
INNER JOIN
(SELECT Date, Customer, MAX(DrinkPrice) AS MaxPrice
FROM
orders
INNER JOIN
(SELECT '2018-01-01' AS Date
UNION
SELECT '2018-01-02' AS Date) dates
WHERE OrderTime >= TIMESTAMP(Date, '15:00:00')
AND OrderTime <= TIMESTAMP(Date, '18:00:00')
GROUP BY Date, Customer
HAVING MaxPrice > 0) b
ON a.OrderTime >= TIMESTAMP(b.Date, '15:00:00')
AND a.OrderTime <= TIMESTAMP(b.Date, '18:00:00')
AND a.Customer = b.Customer;
However, for reasons I don't understand, the database chooses to execute this in a way that takes forever. Explain plan:
+------+--------------+------------+------+---------------+------+------------+------------------------------------------------+
| id | select_type | table | type | possible_keys | key | ref | Extra |
+------+--------------+------------+------+---------------+------+------------+------------------------------------------------+
| 1 | PRIMARY | a | ALL | OrderTime | NULL | NULL | |
| 1 | PRIMARY | <derived2> | ref | key0 | key0 | a.Customer | Using where |
| 2 | DERIVED | <derived3> | ALL | NULL | NULL | NULL | Using temporary; Using filesort |
| 2 | DERIVED | orders | ALL | OrderTime | NULL | NULL | Range checked for each record (index map: 0x1) |
| 3 | DERIVED | NULL | NULL | NULL | NULL | NULL | No tables used |
| 4 | UNION | NULL | NULL | NULL | NULL | NULL | No tables used |
| NULL | UNION RESULT | <union3,4> | ALL | NULL | NULL | NULL | |
+------+--------------+------------+------+---------------+------+------------+------------------------------------------------+
Questions:
What is going on here?
How can I fix it?
To extract the rows from the table representing each customer's most expensive drink order during happy hour (3 PM - 6 PM) each day I would use row_number() over() within a case expression evaluating the hour of day, like this:
CREATE TABLE mytable(
Date DATE
,Customer VARCHAR(10)
,OrderTime DATETIME
,MaxPrice NUMERIC(12,2)
,Bartender VARCHAR(11)
);
note changes were made to OrderTime
INSERT INTO mytable(Date,Customer,OrderTime,MaxPrice,Bartender)
VALUES
('1/1/18','Alice','1/1/18 13:45',13.15,'Jane')
, ('1/1/18','Bob' ,'1/1/18 15:12', 9.08,'Jane')
, ('1/2/18','Alice','1/2/18 13:45',13.15,'Jane')
, ('1/2/18','Bob' ,'1/2/18 15:57', 6.00,'Tarzan')
, ('1/2/18','Carol','1/2/18 13:13', 6.00,'Tarzan')
;
The suggested query is this:
select
*
from (
select
*
, case when hour(OrderTime) between 15 and 18 then
row_number() over(partition by `Date`, customer
order by MaxPrice DESC)
else null
end rn
from mytable
) d
where rn = 1
;
and the result will give access to all columns you include in the derived table.
Date | Customer | OrderTime | MaxPrice | Bartender | rn
:--------- | :------- | :------------------ | -------: | :-------- | -:
0001-01-18 | Bob | 0001-01-18 15:12:00 | 9.08 | Jane | 1
0001-02-18 | Bob | 0001-02-18 15:57:00 | 6.00 | Tarzan | 1
To help display how this works, running the derived table subquery:
select
*
, case when hour(OrderTime) between 15 and 18 then
row_number() over(partition by `Date`, customer order by MaxPrice DESC)
else null
end rn
from mytable
;
produces this interim resultset:
Date | Customer | OrderTime | MaxPrice | Bartender | rn
:--------- | :------- | :------------------ | -------: | :-------- | ---:
0001-01-18 | Alice | 0001-01-18 13:45:00 | 13.15 | Jane | null
0001-01-18 | Bob | 0001-01-18 15:12:00 | 9.08 | Jane | 1
0001-02-18 | Alice | 0001-02-18 13:45:00 | 13.15 | Jane | null
0001-02-18 | Bob | 0001-02-18 15:57:00 | 6.00 | Tarzan | 1
0001-02-18 | Carol | 0001-02-18 13:13:00 | 6.00 | Tarzan | null
db<>fiddle here
The task seems to be a "groupwise-max" problem. Here's one approach, involving only 2 'queries' (the inner one is called a "derived table").
SELECT x.OrderDate, x.Customer, b.OrderTime,
x.MaxPrice, b.Bartender
FROM
(
SELECT DATE(OrderTime) AS OrderDate,
Customer,
Max(Price) AS MaxPrice
FROM tbl
WHERE TIME(OrderTime) BETWEEN '15:00' AND '18:00'
GROUP BY OrderDate, Customer
) AS x
JOIN tbl AS b
ON b.OrderDate = X.OrderDate
AND b.customer = x.Customer
AND b.Price = x.MaxPrice
WHERE TIME(b.OrderTime) BETWEEN '15:00' AND '18:00'
ORDER BY x.OrderDate, x.Customer
Desirable index:
INDEX(Customer, Price)
(There's no good reason to be using MyISAM.)
Billions of new rows per day
This adds new wrinkles. That's upwards of a terabyte of additional disk space needed each and every day?
Is it possible to summarize the data? The goal here is to add summary info as the new data comes in, and never have to re-scan the billions of old data. This may also let you remove all the secondary indexes on the Fact table.
Normalization will help shrink the table size, hence speeding up the queries. Bartender and Customer are prime candidates for such -- perhaps a SMALLINT UNSIGNED (2 bytes; 65K values) for the former and MEDIUMINT UNSIGNED (3 bytes, 16M) for the latter. That would probably shrink by 50% the 5 columns you currently show. You may get a 2x speedup on many operations after normalizing.
Normalization is best done by 'staging' the data -- Load the data into a temporary table, normalize within it, summarize it, then copy into the main Fact table.
See http://mysql.rjweb.org/doc.php/summarytables
and http://mysql.rjweb.org/doc.php/staging_table
Before getting back to the question of optimizing the one query, we need to see the schema, the data flow, whether things can be normalized, whether summary tables can be effective, etc. I would hope to have the 'answer' for the query to be mostly digested in a summary table. Sometimes this leads to a 10x speedup.

Oracle Index Organized Table (IOT) order by primary key suffix better than by primary key prefix

I use an Index Organized Table (IOT) for a table having 550 M rows. The primary key is composed by two columns (id1 and id2) which are also foreign key towards 2 other tables (id1 FK towards table1, id2 FK towards table2).
When using an IOT, according to Oracle doc (http://docs.oracle.com/cd/B28359_01/server.111/b28310/tables012.htm#i1007389), sorting by prefix of the primary key should be faster than sorting by suffix of the primary key. Indeed, rows are sorted according to the primary key and the order of the columns composing it.
However here are the explain plans I get when joining the IOT with the two other tables and I try to order either by id1 or id2. I would have expected to get a better cost by sorting by id1. But it is not the case.
The value used for id1 corresponds to 58000 rows among 489000 rows in total in table1. The value used for id2 corresponds to 760 rows among 248900 rows in total in table2.
The SQL query :
SELECT a.id1, a.id2, a.some_column
FROM iot_table a
INNER JOIN table1 t1 ON t1.id = a.id1
INNER JOIN table2 t2 ON t2.id = a.id2
WHERE t1.col_x = x AND t2.col_y = y
ORDER BY {a.id1|a.id2};
Explain plan order by id1 :
--------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 11243 (100)| |
| 1 | NESTED LOOPS | | 1311K| 42M| 11243 (1)| 00:01:44 |
| 2 | MERGE JOIN CARTESIAN | | 46M| 842M| 11173 (1)| 00:01:43 |
|* 3 | TABLE ACCESS BY INDEX ROWID | TABLE1 | 58152 | 511K| 4745 (1)| 00:00:44 |
| 4 | INDEX FULL SCAN | TABLE1_ID1_IDX | 488K| | 15 (0)| 00:00:01 |
| 5 | BUFFER SORT | | 799 | 7990 | 6429 (1)| 00:01:00 |
| 6 | TABLE ACCESS BY INDEX ROWID| TABLE2 | 799 | 7990 | 1 (0)| 00:00:01 |
|* 7 | INDEX RANGE SCAN | TABLE2_COL_Y_IDX | 799 | | 1 (0)| 00:00:01 |
|* 8 | INDEX UNIQUE SCAN | IOT_TABLE_PK | 1 | 15 | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------
Explain plan order by id2 :
--------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 5159 (100)| |
| 1 | NESTED LOOPS | | 1311K| 42M| 5159 (2)| 00:00:48 |
| 2 | MERGE JOIN CARTESIAN | | 46M| 842M| 5089 (1)| 00:00:47 |
|* 3 | TABLE ACCESS BY INDEX ROWID | TABLE2 | 799 | 7990 | 2512 (1)| 00:00:24 |
| 4 | INDEX FULL SCAN | TABLE2_ID2_IDX | 248K| | 28 (0)| 00:00:01 |
| 5 | BUFFER SORT | | 58152 | 511K| 2577 (2)| 00:00:24 |
| 6 | TABLE ACCESS BY INDEX ROWID| TABLE1 | 58152 | 511K| 3 (0)| 00:00:01 |
|* 7 | INDEX RANGE SCAN | TABLE1_COL_X_IDX | 58152 | | 1 (0)| 00:00:01 |
|* 8 | INDEX UNIQUE SCAN | IOT_TABLE_PK | 1 | 15 | 1 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------
I am wondering why the cost is worst for sorting by id1 whereas the rows should be primarily sorted according to this column and Oracle would just have to browse the IOT B*-Tree as it is.
Thank you for your help
I use Oracle 11.2 g and the stats are up to date.

Resources