I have the following table ("Table") format
A B C
"801331" "5755270" "0"
"1761861" "10556391" "1"
"1761861" "10557381" "33"
"1761861" "11069131" "33"
"801331" "24348751" "0"
"801331" "77219852" "0"
"1761861" "557880972" "0"
And I would like to count and present two different quantities in one table grouped by column A.
The first is:
SELECT A, COUNT(*) FROM Table GROUP BY A
The second one has one condition:
SELECT A, COUNT(*) FROM Table WHERE C != 0 GROUP BY A
I want to have the following result
A 1st 2nd
"1761861" "4" "3"
"801331" "3" "0"
I tried a few answers from questions such as thisOne yet I could not make it happen as the result is one row.
I get it is pretty easy, yet i cannot make it work.
Is there a (simple) way to do it?
SELECT A, sum(1), sum(case when C <> 0 then 1 else 0 end) FROM Table GROUP BY A;
SELECT A, count(*), count(case when C <> 0 then A else null end) FROM Table GROUP BY A;
Related
I have this DB schema with 2 tables one for Athletes and one for Results.
I'm trying to get the last time elapse (or greater) of each athletes using this query:
Select Query
Select Athletes.BibNumber, Athletes.ChipNumber, Athletes.FirstName, Athletes.LastName, Athletes.Sex, Athletes.Category, count(Results.ElapsedTime) as Lapcount, Results.ElapsedTime
From Results, Athletes
Where Results.ChipNumber = Athletes.ChipNumber and Athletes.Category = 'A (Elite)' and Athletes.Sex = 'M' and Results.Active = 1
Group by Athletes.ChipNumber
Order by (Athletes.Sex = 'M') DESC, Athletes.Sex, Athletes.Category, Lapcount DESC, Results.ElapsedTime ASC;
This works ok if the times are added incrementally, but if I edit the time and add or change a time and the record ID is larger then the time the sort order is not applied.
Running the above query the result is:
"1" "2018001" "User" "2" "M" "A (Elite)" "5" "00:00:00.000"
"2" "2018002" "User" "1" "M" "A (Elite)" "5" "01:18:09.923"
But I would like to have:
"1" "2018001" "User" "2" "M" "A (Elite)" "5" "01:11:51.384"
"2" "2018002" "User" "1" "M" "A (Elite)" "5" "01:18:09.923"
DB Schema
CREATE TABLE IF NOT EXISTS `Results` (
`ID` INTEGER PRIMARY KEY AUTOINCREMENT,
`ChipNumber` TEXT,
`ReaderTime` TEXT,
`Antenna` TEXT,
`ElapsedTime` TEXT,
`Active` INTEGER DEFAULT 0
);
INSERT INTO `Results` (ID,ChipNumber,ReaderTime,Antenna,ElapsedTime,Active) VALUES
(72354,'2018002','2018/07/29 12:01:39.000','Gun','00:00:00.000',1),
(72383,'2018001','2018/07/29 12:19:07.975','S3','00:17:28.974',1),
(72386,'2018002','2018/07/29 12:19:51.877','S3','00:18:12.876',1),
(72411,'2018001','2018/07/29 12:36:49.677','S3','00:35:10.676',1),
(72415,'2018002','2018/07/29 12:39:29.232','S3','00:37:50.231',1),
(72433,'2018001','2018/07/29 12:55:08.811','S3','00:53:29.810',1),
(72439,'2018002','2018/07/29 12:59:37.760','M3','00:57:58.759',1),
(72452,'2018001','2018/07/29 13:13:30.385','S3','01:11:51.384',1),
(72456,'2018002','2018/07/29 13:19:48.923','Manual','01:18:09.923',1),
(72465,'2018001','2018/07/29 12:01:39.000','Gun','00:00:00.000',1);
CREATE TABLE IF NOT EXISTS `Athletes` (
`ID` INTEGER PRIMARY KEY AUTOINCREMENT,
`FirstName` TEXT,
`LastName` TEXT,
`Sex` TEXT DEFAULT 'M',
`Category` TEXT DEFAULT NULL,
`BibNumber` INTEGER DEFAULT 0,
`ChipNumber` TEXT DEFAULT 0,
`Active` BOOLEAN DEFAULT 0
);
INSERT INTO `Athletes` (ID,FirstName,LastName,Sex,Category,BibNumber,ChipNumber,Active) VALUES
(3,'User','1','M','A (Elite)',2,'2018002',1),
(29,'User','2','M','A (Elite)',1,'2018001',1);
I believe that your issue is due to the following (see highlighted) :-
If the SELECT statement is an aggregate query without a GROUP BY
clause, then each aggregate expression in the result-set is evaluated
once across the entire dataset. Each non-aggregate expression in the
result-set is evaluated once for an arbitrarily selected row of the
dataset. The same arbitrarily selected row is used for each
non-aggregate expression. Or, if the dataset contains zero rows, then
each non-aggregate expression is evaluated against a row consisting
entirely of NULL values.
SQL As Understood By SQLite - SELECT - 3. Generation of the set of result rows.
As such to ensure that you get the maximum value for the elapsed time you should use an aggregate function, thus max in your case.
Therefore, I believe the following will work for you :-
SELECT Athletes.BibNumber, Athletes.ChipNumber, Athletes.FirstName, Athletes.LastName, Athletes.Sex, Athletes.Category,
count(Results.ElapsedTime) AS Lapcount,
max(Results.ElapsedTime) AS ElapsedTime
FROM Results JOIN Athletes ON Results.ChipNumber = Athletes.ChipNumber
GROUP BY Athletes.ChipNumber
ORDER BY (Athletes.Sex = 'M') DESC, Athletes.Sex, Athletes.Category, Lapcount DESC, Results.ElapsedTime ASC;
I found the query plan is changed between w/o and w/ keys.
CREATE TABLE `data` (
`name` TEXT,
`value` NUMERIC,
PRIMARY KEY(`name`)
) WITHOUT ROWID;
CREATE TABLE `ranges` (
`begin` TEXT,
`end` TEXT,
);
explain query plan select distinct t1.name as name from data t1, ranges t2 where t1.name between t2.begin and t2.end order by name;
"0" "0" "1" "SCAN TABLE ranges AS t2"
"0" "1" "0" "SEARCH TABLE data AS t1 USING PRIMARY KEY (name>? AND name<?)"
"0" "0" "0" "USE TEMP B-TREE FOR DISTINCT"
If I defined begin and end as keys,
CREATE TABLE `ranges` (
`begin` TEXT,
`end` TEXT,
PRIMARY KEY(`begin`,`end`)
);
the query plan is changed to the following.
"0" "0" "0" "SCAN TABLE data AS t1"
"0" "1" "1" "SEARCH TABLE ranges AS t2 USING COVERING INDEX sqlite_autoindex_ranges_1 (begin<?)"
The first query plan is better because in my case data table is much larger than ranges.
I read https://sqlite.org/optoverview.html. It says join order is defined by sqlite's default choice w/o analysis results. Does adding those keys change the default choice? Is there any other trick to let SQLite use the first query plan w/o providing stat data?
Also, is the default choice non-changed? Will it be changed later? I use 3.22.
I also notice that if I do not use order by and distinct, it always uses the first plan
explain query plan select t1.name as name from data t1, ranges t2 where t1.name between t2.begin and t2.end;
"0" "0" "1" "SCAN TABLE ranges AS t2"
"0" "1" "0" "SEARCH TABLE data AS t1 USING PRIMARY KEY (name>? AND name<?)"
In SQLite, I have
create tbl (id int, name text primary key);
create index tblIdIdx on tbl(id);
create tblAttributes (id int, name text, value numeric, primary key(id, name));
When I do
explain query plan select name, value from tblAttributes where sweepId = (select max(id) from tbl) and (name = 'n1' or name = 'n2' or name = 'n3');
I got the following results
selectid order from detail
"0" "0" "0" "SEARCH TABLE tblAttributes USING PRIMARY KEY (id=? AND name=?)"
"0" "0" "0" "EXECUTE SCALAR SUBQUERY 1"
"1" "0" "0" "SEARCH TABLE tbl USING COVERING INDEX tblIdIdx"
"0" "0" "0" EXECUTE LIST SUBQUERY 2"
https://www.sqlite.org/eqp.html explains that "EXECUTE SCALAR" is a cached query. What does EXECUTE LIST mean? Is it also cached?
I have a with clause that groups some weather data by time intervals and weather descriptions:
With
temp_table (counter, hour, current_Weather_description) as
(
SELECT count(*) as counter,
CASE WHEN strftime('%M', time_stamp) < '30'
THEN cast(strftime('%H', time_stamp) as int)
ELSE cast(strftime('%H', time_stamp, '+1 hours') as int)
END as hour,
current_weather_description
FROM weather_events
GROUP BY strftime('%H', time_stamp, '+30 minutes'),
current_Weather_Description
order by hour desc
)
select *
from temp_table
Result {counter, hour, current_weather_description}:
"1" "10" "Cloudy"
"2" "9" "Clear"
"1" "9" "Meatballs"
"2" "8" "Rain"
"2" "7" "Clear"
"2" "6" "Clear"
"1" "5" "Clear"
"1" "5" "Cloudy"
"1" "4" "Clear"
"1" "4" "Rain"
"1" "3" "Rain"
"1" "3" "Snow"
"1" "2" "Rain"
Now I would like to write a recursive query that goes hour by hour selecting the top row. The top row will always include the description with the highest occurrence (count) for that time interval or in case of a tie, it will still chose the top row.
Here's my first attempt:
With recursive
temp_table (counter, hour, current_Weather_description) as
(
SELECT count(*) as counter,
CASE WHEN strftime('%M', time_stamp) < '30'
THEN cast(strftime('%H', time_stamp) as int)
ELSE cast(strftime('%H', time_stamp, '+1 hours') as int)
END as hour,
current_weather_description
FROM weather_events
GROUP BY strftime('%H', time_stamp, '+30 minutes'),
current_Weather_Description
order by hour desc
),
segment (anchor_hour, hour, current_Weather_description) as
(
select cast(strftime('%H','2016-01-20 10:14:17') as int) as anchor_hour,
hour,
current_Weather_Description
from temp_table
where hour = anchor_hour
limit 1
union all
select segment.anchor_hour-1,
hour,
current_Weather_Description
from temp_table
where hour = anchor_hour - 1
limit 1
)
select *
from segment
From playing around with the query it seems it wants my recursive members "from" to be from "segment" instead of my temp_table. I don't understand why it wants me to do that. I'm trying to do something similar to this example, but I would like only 1 row from each recursive query.
This is the result I desire {count, hour, description}:
"1" "10" "Cloudy"
"2" "9" "Clear"
"2" "8" "Rain"
"2" "7" "Clear"
"2" "6" "Clear"
"1" "5" "Clear"
"1" "4" "Clear"
"1" "3" "Rain"
"1" "2" "Rain"
This can simply be done with another GROUP BY:
WITH
temp_table(counter, hour, current_Weather_description) AS (
...
),
segment(count, hour, description) AS (
SELECT MAX(counter),
hour,
current_Weather_description
FROM temp_table
GROUP BY hour
)
SELECT count, hour, description
FROM segment
ORDER BY hour DESC;
(In SQLite, MAX() can be used to select entire rows from a group.)
The data is Financial data, structured in buckets, where one bucket (Rollup) can contain other buckets of data. As Example structure:
Rollup1 | Dept1
Rollup1 | Rollup2 | Dept2
Rollup1 | Rollup2 | Dept3
Rollup1 | Rollup3 | Dept4
Rollup1 | Rollup3 | Rollup4 | Dept5
Rollup1 | Rollup3 | Rollup4 | Dept6
There are 8 Columns of this, with Rollups and Depts scattered throughout (but the leaves are always single Depts). Approx 10k rows.
The goal of the Query Result is to show a single column with ALL Rollups, with variable logic to present certain Rollups normally, and modifying all other Rollups.
For example, if my variable contained "Dept4", my result would be:
Rollup1
Rollup3
NA - Rollup2
NA - Rollup4
In the real scenario, there are 3 variables which determine the display of the Rollup column.
Here is what I have, which functions as it should, however, the performance is VERY bad. 1 Query takes up to 5 seconds, which I would like to improve.
SELECT DISTINCT CASE
WHEN "2" NOT IN
(
SELECT "2"
FROM "Finance New"
WHERE (#VAR3 = 'All' OR #VAR3 IN ("2","3","4","5","6","7","8","9"))
AND (#VAR4 = 'All' OR "10" = #VAR4)
AND (#VAR5 = 'All' OR "11" = #VAR5)
)
THEN
'Z N/A - ' || "2"
ELSE
"2"
END AS COL2
FROM "Finance New"
WHERE "5" <> 'All Applicable' AND "1" <> '9999'
AND "2" LIKE '9%'
UNION
SELECT DISTINCT CASE
WHEN "3" NOT IN
(
SELECT "3"
FROM "Finance New"
WHERE (#VAR3 = 'All' OR #VAR3 IN ("2","3","4","5","6","7","8","9"))
AND (#VAR4 = 'All' OR "10" = #VAR4)
AND (#VAR5 = 'All' OR "11" = #VAR5)
)
THEN
'Z N/A - ' || "3"
ELSE
"3"
END AS COL2
FROM "Finance New"
WHERE "5" <> 'All Applicable' AND "1" <> '9999'
AND "3" LIKE '9%'
UNION
Etc, for each of the columns in the Rollup/Dept Tree report.
The inner select in each union query appends to the text based on the variable criteria. Sorting is done automatically. The last line before UNION (AND "3" LIKE "9%") is to actually grab the Rollup. Rollups all start with 9.
Input parameters are labeled like #VARx.
I'm wondering if there is a more efficient way of performing this, assuming I cannot create a temp table and cannot change the structure of the data.
Thank you!
All these ORs prevent the use of indexes.
If at all possible, remove those #VAR = 'All' comparisons (or the other one) and create the SQL string dynamically, depending on the actual VAR3/4/5 values.
The LIKE prevents the use of indexes (because LIKE (or GLOB) would require TEXT affinity on the indexed column).
Replace this with normal comparisons, i.e., replace "col" LIKE '9%' with "col" >= '9' AND "col" < ':'.
The UNION already removes duplicates; drop the DISTINCTs.
Without indexes, all these queries do full table scans.
Create the following (covering) indexes:
CREATE INDEX i_10_11_all on "Finance New"("10","11", "2","3","4","5","6","7","8","9");
CREATE INDEX i_11_10_all on "Finance New"("11","10", "2","3","4","5","6","7","8","9");
CREATE INDEX i_2_1_5 on "Finance New"("2", "1","5");
CREATE INDEX i_3_1_5 on "Finance New"("3", "1","5");
-- and so on for 4..9