I am executing below queries , but one query is giving me resultand other is giving me [Error 2646] [SQLState HY000] No more spool space in USER.
SELECT DISTINCT PARTITION
FROM DB.TABLE
ORDER BY PARTITION ASC;
Error:-[Error 2646] [SQLState HY000] No more spool space in USER.
But when i am executing it is giving me results:-
select * from (
SELECT DISTINCT PARTITION
FROM DB."TABLE") x
ORDER BY X.PARTITION ASC;
Teradata's optimizer usually checks if a DISTINCT can be rewritten using GROUP BY (and vice versa).
In your 1st query it choose distinct processing (which is redistribution followed by a sort) because of the ORDER BY (of course this is stupid).
Derived Tables using DISTINCT will not be fwolded, i.e. the optimizer will materialize it. Without the DISTINCT it applies the aggregate rewrite which does a AMP-local aggregation as 1st step greatly reducing spool usage.
If you add COUNT(*) to #1 it will not spool out and return useful information :)
On the other hand, if I wanted to know about partitions with data I would query dbc.Stats...
Related
The below crashes my DB Browser. Essentially I am trying to sum sales ("sales") by a sales person ("name") that occurred between two dates ("beg_period" and "end_period") pulled from a separate table.
SELECT ta.name, ta.beg_period, ta.end_period,
(SELECT SUM(tb.sales)
FROM sales_log tb
WHERE ta.name = tb.name
AND tb.date BETWEEN ta.beg_period AND ta.end_period
)
FROM performance ta
;
The nested query can be re-written as a single query with a standard join.
SELECT ta.name, ta.beg_period, ta.end_period, SUM(tb.sales)
FROM performance ta INNER JOIN sales_log tb
ON ta.name = tb.name
WHERE tb.date BETWEEN ta.beg_period AND ta.end_period
GROUP BY ta.name, ta.beg_period, ta.end_period;
My guess is that the original query was okay (however inefficient), but DB Browser just didn't know how to interpret the subquery for whatever parsing it attempts, etc. In other words, just because it crashed DB Browser doesn't mean that it would crash sqlite library. Try another sqlite database manager.
I have a aggregate query properly indexed to return ordered results fast (simple index scan). This works as expected when ordering is ascending (ASC), but reversing the order (DESC) results in sqlite creating a TEMP B-TREE
sqlite version 3.26.0
CREATE TABLE t1(x,y);
INSERT INTO t1 VALUES(1,1);
INSERT INTO t1 VALUES(1,2);
INSERT INTO t1 VALUES(2,1);
CREATE INDEX ix1 ON t1(x,y);
EXPLAIN QUERY PLAN SELECT x,max(y) FROM t1 GROUP BY x ORDER BY x;
EXPLAIN QUERY PLAN SELECT x,max(y) FROM t1 GROUP BY x ORDER BY x DESC; -- This query constructs a TEMP B-TREE, why?
When running above code you will see query 1 simply running a index scan, while query 2 in addition to running a index scan, also makes a TEMP B-TREE to order the result, destroying performance.
The index created supports traversing both direction so I would expect same performance for both ASC and DESC ordering.
Is this a known limitation in sqlite and aggregates, or am I expecting/doing something wrong?
The code for looping over an index goes backwards only when needed. For implementing GROUP BY itself, going backwards is never needed, so it is never tried.
In reaction to your report on the sqlite-users mailing list, SQLite version 3.30.0 will have code to handle this case:
/* The GROUP BY processing doesn't care whether rows are delivered in
** ASC or DESC order - only that each group is returned contiguously.
** So set the ASC/DESC flags in the GROUP BY to match those in the
** ORDER BY to maximize the chances of rows being delivered in an
** order that makes the ORDER BY redundant. */
There is no join in the query, it is a simple query with two count distinct. But it is consuming more than 9k cpu.
I have taken the necessary stats, but unable to reduce the CPU. please suggest some good methods to reduce the CPU
can you please let me know what is the best way to reduce the impact CPU
I think the target table is a SET table so your query is taking a lot of CPU (duplicate row elimination).
1) Test your select query on a MULTISET table.
insert into multiset_table
select count(distinct col1) from source_table.
And I believe that your primary index is skewed, the reason for high impact CPU.
2) Make sure your primary index is unique.
select hashamp(hashbucket(hashrow(<primary index columns>))), count(*) (bigint) cnt from target_table group by 1 order by 2 desc;
If the cnt column is not distributed evenly, then change primary index of the table with more unique columns.
Only 2 things can cause merge to run slow,
1) Target table is SET table
2) Primary index of the target table is badly skewed
I need to run a query which joins 5 large table on user_id and filter it on proc_date.
I have planed to do partition on proc_date and partition(5 range partition) on user_id to increase query performance. I keep primary index as well on proc_date and user_id.
"But how can I run the query for just one partition of the user_id at a time? I want to restrict the query to join first partition(on User_id) of every table"
Reason behind this is, once I complete the query for first partition, I can send the output data for next process. While next process is running i can run the query for 2nd partition.
Could anyone please give me some solution to achieve this.
A strange thing, that I don't know the cause, is happenning when trying to collect results from a db2 database.
The query is the following:
SELECT
COUNT(*)
FROM
MYSCHEMA.TABLE1 T1
WHERE
NOT EXISTS (
SELECT
*
FROM
MYSCHEMA.TABLE2 T2
WHERE
T2.PRIMARY_KEY_PART_1 = T1.PRIMARY_KEY_PART_2
AND T2.PRIMARY_KEY_PART_2 = T1.PRIMARY_KEY_PART_2
)
It is a very simple one.
The strange thing is, this same query, if I change COUNT(*) to * I will get 8 results and using COUNT(*) I will get only 2. The process was repeated some more times and the strange result is still continuing.
At this example, TABLE2 is a parent table of the TABLE1 where the primary key of the TABLE1 is PRIMARY_KEY_PART_1 and PRIMARY_KEY_PART_2, and the primary key of the TABLE2 is PRIMARY_KEY_PART_1, PRIMARY_KEY_PART_2 and PRIMARY_KEY_PART_3.
There's no foreign key between them (because they were legacy ones) and they have a huge amount of data.
The DB2 query SELECT VERSIONNUMBER FROM SYSIBM.SYSVERSIONS returns:
7020400
8020400
9010600
And the client used is SquirrelSQL 3.6 (without the rows limit marked).
So, what is the explanation to this strange result?
Without the details (including, at least, the exact Db2 version and DDL for both tables and their indexes) it can be just anything, and even with that details only IBM support will be really able to say, what is the actual reason.
Generally this looks like damaged data (e.g. differences in index vs table data).
Worth to open the support case with IBM.