I have been looking for solutions of a similar question like
https://forums.asp.net/t/1887910.aspx?Inner+join+2+tables+but+return+all+if+1+table+empty
Someone proposed a solution like the following
SELECT
*
FROM
TableA AS a
LEFT JOIN TableB AS b ON b.A_Id = a.A_Id
WHERE
b.A_Id IS NOT null
OR
NOT EXISTS (SELECT TOP 1 A_Id FROM TableB)
Does SQLite run not exists (select top 1 A_Id from TableB) for each row? This condition should be the same for all rows. I was wondering if the query engine could hoist it to the very top. Otherwise, it would take a lot of cost.
I tried the following
EXPLAIN QUERY PLAN
select test.testID, test.name, parameter.name
from test
left join parameter
on test.testID = parameter.testID
where
parameter.testID is not null
or not exists (select testId from parameter limit 1)
It reports
selectid order form detail
0 0 0 SCAN TABLE test (~1M rows)
0 1 1 search table parameter using automatic covering index (testID=?) (~ 7 rows)
0 0 0 EXECUTE SCALAR SUBQUERY 1
1 0 0 SCAN TABLE parameter (~1M rows)
Does it mean SQlite does not hoist (select testId from parameter limit 1)?
The documentation says:
A correlated subquery is reevaluated each time its result is required. An uncorrelated subquery is evaluated only once and the result reused as necessary.
Your subquery is not correlated, because it does not contain references to columns in the outer query. (And the EXPLAIN QUERY PLAN output would show EXECUTE CORRELATED SCALAR SUBQUERY ....)
Related
I found behavior of RANDOM() function in SQLite, which doesn't seems correct.
I want to generate random groups using random RANDOM() and CASE. However, it looks like CTE is not behaving in a correct way.
First, let's create a table
DROP TABLE IF EXISTS tt10ROWS;
CREATE TEMP TABLE tt10ROWS (
some_int INTEGER);
INSERT INTO tt10ROWS VALUES
(1), (2), (3), (4), (5), (6), (7), (8), (9), (10);
SELECT * FROM tt10ROWS;
Incorrect behaviour
WITH
-- 2.a add columns with random number and save in CTE
STEP_01 AS (
SELECT
*,
ABS(RANDOM()) % 4 + 1 AS RAND_1_TO_4
FROM tt10ROWS)
-- 2.b - get random group
select
*,
CASE
WHEN RAND_1_TO_4 = 1 THEN 'GROUP_01'
WHEN RAND_1_TO_4 = 2 THEN 'GROUP_02'
WHEN RAND_1_TO_4 = 3 THEN 'GROUP_03'
WHEN RAND_1_TO_4 = 4 THEN 'GROUP_04'
END AS GROUP_IT
from STEP_01;
Using such query we get a table, which generates correct values for RAND_1_TO_4 columns, but GROUP_IT column is incorrect. We can see, that groups don't match and some groups even missing.
Correct behaviour
I found a walkaround for such problem by creating a temporary table instead of using CTE. It helped.
-- 1.a - add column with random number 1-4 and save as TEMP TABLE
drop table if exists ttSTEP01;
CREATE TEMP TABLE ttSTEP01 AS
SELECT
*,
ABS(RANDOM()) % 4 + 1 AS RAND_1_TO_4
FROM tt10ROWS;
-- 1.b - get random group
select
*,
CASE
WHEN RAND_1_TO_4 = 1 THEN 'GROUP_01'
WHEN RAND_1_TO_4 = 2 THEN 'GROUP_02'
WHEN RAND_1_TO_4 = 3 THEN 'GROUP_03'
WHEN RAND_1_TO_4 = 4 THEN 'GROUP_04'
END AS GROUP_IT
from ttSTEP01;
QUESTION
What is the reasons behind such behaviour, where GROUP_IT column is not generated properly?
If you look at the bytecode generated by the incorrect query using EXPLAIN, you'll see that every time the RAND_1_TO_4 column is referenced, its value is re-calculated and a new random number is used (I suspect but aren't 100% sure this has something to do with how random() is a non-deterministic function). The null values are for those times when none of the CASE tests end up being true.
When you insert into a temporary table and then use that for the rest, the values of course remain static and it works as expected.
suppose I have tree table
h y t
------- ----- ------------
id id id name
------- ----- ------------
1 1 1 john
2 2 2 alex
3 8 6 maggie
and I have a query like this:
select t.*,(select y.id from (select * h where h.id > t.id) y) t
problem is I can't use t.id in inner query. I want to know what is the problem and what is the solution?
i'm using this query in oracle 11g
You can only refer to tables (or their aliases) in an outer scope across one level. So t is not in scope in the innermost level, and not recognised.
Your query will only work if there is a single h record with a higher ID than each t record anyway - which seems unlikely; otherwise the subquery will return too many rows.
You don't need nested, or any, subqueries here. You have more levels than you need anyway. For this example you can just do:
select t.*, h.id from t join h on h.id > t.id
But since your example data and query don't match up it's hard to tell what you really need.
I have a SQLite table which contains a numeric field field_name. I need to group by ranges of this column, something like this: SELECT CAST(field_name/100 AS INT), COUNT(*) FROM table GROUP BY CAST(field_name/100 AS INT), but including ranges which have no value (COUNT for them should be 0). And I can't get how to perform such a query?
You can do this by using a join and (though kludgy) an extra table.
The extra table would contain each of the values you want a row for in the response to your query (this would not only fill in missing CAST(field_name/100 AS INT) values between your returned values, but also let you expand it such that if your current groups were 5, 6, 7 you could include 0 through 10.
In other flavors of SQL you'd be able to right join or full outer join, and you'd be on your way. Alas, SQLite doesn't offer these.
Accordingly, we'll use a cross join (join everything to everything) and then filter. If you've got a relatively small database or a small number of groups, you're in good shape. If you have large numbers of both, this will be a very intensive way to go about this (the cross join result will have #ofRowsOfData * #ofGroups rows, so watch out).
Example:
TABLE: groups_for_report
desired_group
-------------
0
1
2
3
4
5
6
Table: data
fieldname other_field
--------- -----------
250 somestuff
230 someotherstuff
600 stuff
you would use a query like
select groups_for_report.desired_group, count(data.fieldname)
from data
cross join groups_for_report
where CAST(fieldname/100.0 AS INT)=desired_group
group by desired_group;
The Transact-Sql Count Distinct operation counts all non-null values in a column. I need to count the number of distinct values per column in a set of tables, including null values (so if there is a null in the column, the result should be (Select Count(Distinct COLNAME) From TABLE) + 1.
This is going to be repeated over every column in every table in the DB. Includes hundreds of tables, some of which have over 1M rows. Because this needs to be done over every single column, adding Indexes for every column is not a good option.
This will be done as part of an ASP.net site, so integration with code logic is also ok (i.e.: this doesn't have to be completed as part of one query, though if that can be done with good performance, then even better).
What is the most efficient way to do this?
Update After Testing
I tested the different methods from the answers given on a good representative table. The table has 3.2 million records, dozens of columns (a few with indexes, most without). One column has 3.2 million unique values. Other columns range from all Null (one value) to a max of 40K unique values. For each method I performed four tests (with multiple attempts at each, averaging the results): 20 columns at one time, 5 columns at one time, 1 column with many values (3.2M) and 1 column with a small number of values (167). Here are the results, in order of fastest to slowest
Count/GroupBy (Cheran)
CountDistinct+SubQuery (Ellis)
dense_rank (Eriksson)
Count+Max (Andriy)
Testing Results (in seconds):
Method 20_Columns 5_Columns 1_Column (Large) 1_Column (Small)
1) Count/GroupBy 10.8 4.8 2.8 0.14
2) CountDistinct 12.4 4.8 3 0.7
3) dense_rank 226 30 6 4.33
4) Count+Max 98.5 44 16 12.5
Notes:
Interestingly enough, the two methods that were fastest (by far, with only a small difference in between then) were both methods that submitted separate queries for each column (and in the case of result #2, the query included a subquery, so there were really two queries submitted per column). Perhaps because the gains that would be achieved by limiting the number of table scans is small in comparison to the performance hit taken in terms of memory requirements (just a guess).
Though the dense_rank method is definitely the most elegant, it seems that it doesn't scale well (see the result for 20 columns, which is by far the worst of the four methods), and even on a small scale just cannot compete with the performance of Count.
Thanks for the help and suggestions!
SELECT COUNT(*)
FROM (SELECT ColumnName
FROM TableName
GROUP BY ColumnName) AS s;
GROUP BY selects distinct values including NULL. COUNT(*) will include NULLs, as opposed to COUNT(ColumnName), which ignores NULLs.
I think you should try to keep the number of table scans down and count all columns in one table in one go. Something like this could be worth trying.
;with C as
(
select dense_rank() over(order by Col1) as dnCol1,
dense_rank() over(order by Col2) as dnCol2
from YourTable
)
select max(dnCol1) as CountCol1,
max(dnCol2) as CountCol2
from C
Test the query at SE-Data
A development on OP's own solution:
SELECT
COUNT(DISTINCT acolumn) + MAX(CASE WHEN acolumn IS NULL THEN 1 ELSE 0 END)
FROM atable
Run one query that Counts the number of Distinct values and adds 1 if there are any NULLs in the column (using a subquery)
Select Count(Distinct COLUMNNAME) +
Case When Exists
(Select * from TABLENAME Where COLUMNNAME is Null)
Then 1 Else 0 End
From TABLENAME
You can try:
count(
distinct coalesce(
your_table.column_1, your_table.column_2
-- cast them if you want replace value from column are not same type
)
) as COUNT_TEST
Function coalesce help you combine two columns with replace not null values.
I used this in mine case and success with correctly result.
Not sure this would be the fastest but might be worth testing. Use case to give null a value. Clearly you would need to select a value for null that would not occur in the real data. According to the query plan this would be a dead heat with the count(*) (group by) solution proposed by Cheran S.
SELECT
COUNT( distinct
(case when [testNull] is null then 'dbNullValue' else [testNull] end)
)
FROM [test].[dbo].[testNullVal]
With this approach can also count more than one column
SELECT
COUNT( distinct
(case when [testNull1] is null then 'dbNullValue' else [testNull1] end)
),
COUNT( distinct
(case when [testNull2] is null then 'dbNullValue' else [testNull2] end)
)
FROM [test].[dbo].[testNullVal]
I have SQLite database and I have in it certain column of type "double".
I want to get a row that has in this column value closest to a specified one.
For example, in my table I have:
id: 1; value: 47
id: 2; value: 56
id: 3; value: 51
And I want to get a row that has its value closest to 50. So I want to receive id: 3 (value = 51).
How can I achieve this goal?
Thanks.
Using an order-by, SQLite will scan the entire table and load all the values into a temporary b-tree to order them, making any index useless. This will be very slow and use a lot of memory on large tables:
explain query plan select * from 'table' order by abs(10 - value) limit 1;
0|0|0|SCAN TABLE table
0|0|0|USE TEMP B-TREE FOR ORDER BY
You can get the next lower or higher value using the index like this:
select min(value) from 'table' where x >= N;
select max(value) from 'table' where x <= N;
And you can use union to get both from a single query:
explain query plan
select min(value) from 'table' where value >= 10
union select max(value) from 'table' where value <= 10;
1|0|0|SEARCH TABLE table USING COVERING INDEX value_index (value>?)
2|0|0|SEARCH TABLE table USING COVERING INDEX value_index (value<?)
0|0|0|COMPOUND SUBQUERIES 1 AND 2 USING TEMP B-TREE (UNION)
This will be pretty fast even on large tables. You could simply load both values and evaluate them in your code, or use even more sql to select one in various ways:
explain query plan select v from
( select min(value) as v from 'table' where value >= 10
union select max(value) as v from 'table' where value <= 10)
order by abs(10-v) limit 1;
2|0|0|SEARCH TABLE table USING COVERING INDEX value_index (value>?)
3|0|0|SEARCH TABLE table USING COVERING INDEX value_index (value<?)
1|0|0|COMPOUND SUBQUERIES 2 AND 3 USING TEMP B-TREE (UNION)
0|0|0|SCAN SUBQUERY 1
0|0|0|USE TEMP B-TREE FOR ORDER BY
or
explain query plan select 10+v from
( select min(value)-10 as v from 'table' where value >= 10
union select max(value)-10 as v from 'table' where value <= 10)
group by v having max(abs(v)) limit 1;
2|0|0|SEARCH TABLE table USING COVERING INDEX value_index (value>?)
3|0|0|SEARCH TABLE table USING COVERING INDEX value_index (value<?)
1|0|0|COMPOUND SUBQUERIES 2 AND 3 USING TEMP B-TREE (UNION)
0|0|0|SCAN SUBQUERY 1
0|0|0|USE TEMP B-TREE FOR GROUP BY
Since you're interested in values both arbitrarily greater and less than the target, you can't avoid doing two index searches. If you know that the target is within a small range, though, you could use "between" to only hit the index once:
explain query plan select * from 'table' where value between 9 and 11 order by abs(10-value) limit 1;
0|0|0|SEARCH TABLE table USING COVERING INDEX value_index (value>? AND value<?)
0|0|0|USE TEMP B-TREE FOR ORDER BY
This will be around 2x faster than the union query above when it only evaluates 1-2 values, but if you start having to load more data it will quickly become slower.
This should work:
SELECT * FROM table
ORDER BY ABS(? - value)
LIMIT 1
Where ? represents the value you want to compare against.