recursive CTE from Ordinary CTE - sqlite

I have a with clause that groups some weather data by time intervals and weather descriptions:
With
temp_table (counter, hour, current_Weather_description) as
(
SELECT count(*) as counter,
CASE WHEN strftime('%M', time_stamp) < '30'
THEN cast(strftime('%H', time_stamp) as int)
ELSE cast(strftime('%H', time_stamp, '+1 hours') as int)
END as hour,
current_weather_description
FROM weather_events
GROUP BY strftime('%H', time_stamp, '+30 minutes'),
current_Weather_Description
order by hour desc
)
select *
from temp_table
Result {counter, hour, current_weather_description}:
"1" "10" "Cloudy"
"2" "9" "Clear"
"1" "9" "Meatballs"
"2" "8" "Rain"
"2" "7" "Clear"
"2" "6" "Clear"
"1" "5" "Clear"
"1" "5" "Cloudy"
"1" "4" "Clear"
"1" "4" "Rain"
"1" "3" "Rain"
"1" "3" "Snow"
"1" "2" "Rain"
Now I would like to write a recursive query that goes hour by hour selecting the top row. The top row will always include the description with the highest occurrence (count) for that time interval or in case of a tie, it will still chose the top row.
Here's my first attempt:
With recursive
temp_table (counter, hour, current_Weather_description) as
(
SELECT count(*) as counter,
CASE WHEN strftime('%M', time_stamp) < '30'
THEN cast(strftime('%H', time_stamp) as int)
ELSE cast(strftime('%H', time_stamp, '+1 hours') as int)
END as hour,
current_weather_description
FROM weather_events
GROUP BY strftime('%H', time_stamp, '+30 minutes'),
current_Weather_Description
order by hour desc
),
segment (anchor_hour, hour, current_Weather_description) as
(
select cast(strftime('%H','2016-01-20 10:14:17') as int) as anchor_hour,
hour,
current_Weather_Description
from temp_table
where hour = anchor_hour
limit 1
union all
select segment.anchor_hour-1,
hour,
current_Weather_Description
from temp_table
where hour = anchor_hour - 1
limit 1
)
select *
from segment
From playing around with the query it seems it wants my recursive members "from" to be from "segment" instead of my temp_table. I don't understand why it wants me to do that. I'm trying to do something similar to this example, but I would like only 1 row from each recursive query.
This is the result I desire {count, hour, description}:
"1" "10" "Cloudy"
"2" "9" "Clear"
"2" "8" "Rain"
"2" "7" "Clear"
"2" "6" "Clear"
"1" "5" "Clear"
"1" "4" "Clear"
"1" "3" "Rain"
"1" "2" "Rain"

This can simply be done with another GROUP BY:
WITH
temp_table(counter, hour, current_Weather_description) AS (
...
),
segment(count, hour, description) AS (
SELECT MAX(counter),
hour,
current_Weather_description
FROM temp_table
GROUP BY hour
)
SELECT count, hour, description
FROM segment
ORDER BY hour DESC;
(In SQLite, MAX() can be used to select entire rows from a group.)

Related

How to solve the Error -> Error: near "then": syntax error

I used sqldf packges in R.
I tried:
project=sqldf('select distinct
ID
, sex
, case when age<30 then "1" --20대
when age<40 then "2" --30대
when age<50 then "3" --40대
when age<60 then "4" --50대
when age<70 then "5" --60대
when 70<=age then "6" --70대
end as age
, ho_incm
, town_t
, case when tins in (10, 20) then "1" --국민건강보험
when tins=30 then "2" --의료급여
end as tins
, edu
, marri_1 in (1, 2) then marri_1 --1:기혼, 2:이혼
end as marri_1
, case when EC1_1 in (1, 2) then EC1_1 --1:취업, 2:실업
end as EC1_1
, case when BS3_1 in (1, 2) then "1" --현재흡연
when BS3_1=3 then "2" --과거흡연
when BS3_1=8 then "3" --비흡연
end as BS3_1
, case when BD1_11 in (1, 2) then "2" --비음주
when BD1_11 in (3, 4, 5, 6) then "1" --음주
end as BD1_11
, case when BP1 in (1, 2) then "1" --스트레스 많이느낌
when BP1 in (3, 4) then "2" --스트레그 적게느낌
end as BP1
, case when HE_BMI<18.5 then "1" --저체중
when 18.5<=HE_BMI<23 then "2" --정상
when 23<=HE_BMI<25 then "3" --과체중
when 25<=HE_BMI then "4" --비만
end as HE_BMI
, case when DL1_dg in (0, 1) then DL1_dg --0:무, 1:유
end as DL1_dg
FROM data
WHERE AGE >= 19
')
But then I got an error message
Error: near "then": syntax error
Any help will be appreciated.
project=sqldf('select distinct
ID
, sex
, case when age<30 then "1" --20대
when age<40 then "2" --30대
when age<50 then "3" --40대
when age<60 then "4" --50대
when age<70 then "5" --60대
when 70<=age then "6" --70대
end as age
, ho_incm
, town_t
, case when tins in (10, 20) then "1" --국민건강보험
when tins=30 then "2" --의료급여
end as tins
, edu
, marri_1 in (1, 2) then marri_1 --1:기혼, 2:이혼
end as marri_1
, case when EC1_1 in (1, 2) then EC1_1 --1:취업, 2:실업
end as EC1_1
, case when BS3_1 in (1, 2) then "1" --현재흡연
when BS3_1=3 then "2" --과거흡연
when BS3_1=8 then "3" --비흡연
end as BS3_1
, case when BD1_11 in (1, 2) then "2" --비음주
when BD1_11 in (3, 4, 5, 6) then "1" --음주
end as BD1_11
, case when BP1 in (1, 2) then "1" --스트레스 많이느낌
when BP1 in (3, 4) then "2" --스트레그 적게느낌
end as BP1
, case when HE_BMI<18.5 then "1" --저체중
when 18.5<=HE_BMI<23 then "2" --정상
when 23<=HE_BMI<25 then "3" --과체중
when 25<=HE_BMI then "4" --비만
end as HE_BMI
, case when DL1_dg in (0, 1) then DL1_dg --0:무, 1:유
end as DL1_dg
FROM data
WHERE AGE >= 19
')

table join order is changed between w/ and w/o keys

I found the query plan is changed between w/o and w/ keys.
CREATE TABLE `data` (
`name` TEXT,
`value` NUMERIC,
PRIMARY KEY(`name`)
) WITHOUT ROWID;
CREATE TABLE `ranges` (
`begin` TEXT,
`end` TEXT,
);
explain query plan select distinct t1.name as name from data t1, ranges t2 where t1.name between t2.begin and t2.end order by name;
"0" "0" "1" "SCAN TABLE ranges AS t2"
"0" "1" "0" "SEARCH TABLE data AS t1 USING PRIMARY KEY (name>? AND name<?)"
"0" "0" "0" "USE TEMP B-TREE FOR DISTINCT"
If I defined begin and end as keys,
CREATE TABLE `ranges` (
`begin` TEXT,
`end` TEXT,
PRIMARY KEY(`begin`,`end`)
);
the query plan is changed to the following.
"0" "0" "0" "SCAN TABLE data AS t1"
"0" "1" "1" "SEARCH TABLE ranges AS t2 USING COVERING INDEX sqlite_autoindex_ranges_1 (begin<?)"
The first query plan is better because in my case data table is much larger than ranges.
I read https://sqlite.org/optoverview.html. It says join order is defined by sqlite's default choice w/o analysis results. Does adding those keys change the default choice? Is there any other trick to let SQLite use the first query plan w/o providing stat data?
Also, is the default choice non-changed? Will it be changed later? I use 3.22.
I also notice that if I do not use order by and distinct, it always uses the first plan
explain query plan select t1.name as name from data t1, ranges t2 where t1.name between t2.begin and t2.end;
"0" "0" "1" "SCAN TABLE ranges AS t2"
"0" "1" "0" "SEARCH TABLE data AS t1 USING PRIMARY KEY (name>? AND name<?)"

What is execute list subquery

In SQLite, I have
create tbl (id int, name text primary key);
create index tblIdIdx on tbl(id);
create tblAttributes (id int, name text, value numeric, primary key(id, name));
When I do
explain query plan select name, value from tblAttributes where sweepId = (select max(id) from tbl) and (name = 'n1' or name = 'n2' or name = 'n3');
I got the following results
selectid order from detail
"0" "0" "0" "SEARCH TABLE tblAttributes USING PRIMARY KEY (id=? AND name=?)"
"0" "0" "0" "EXECUTE SCALAR SUBQUERY 1"
"1" "0" "0" "SEARCH TABLE tbl USING COVERING INDEX tblIdIdx"
"0" "0" "0" EXECUTE LIST SUBQUERY 2"
https://www.sqlite.org/eqp.html explains that "EXECUTE SCALAR" is a cached query. What does EXECUTE LIST mean? Is it also cached?

Combine two counts in SQLite

I have the following table ("Table") format
A B C
"801331" "5755270" "0"
"1761861" "10556391" "1"
"1761861" "10557381" "33"
"1761861" "11069131" "33"
"801331" "24348751" "0"
"801331" "77219852" "0"
"1761861" "557880972" "0"
And I would like to count and present two different quantities in one table grouped by column A.
The first is:
SELECT A, COUNT(*) FROM Table GROUP BY A
The second one has one condition:
SELECT A, COUNT(*) FROM Table WHERE C != 0 GROUP BY A
I want to have the following result
A 1st 2nd
"1761861" "4" "3"
"801331" "3" "0"
I tried a few answers from questions such as thisOne yet I could not make it happen as the result is one row.
I get it is pretty easy, yet i cannot make it work.
Is there a (simple) way to do it?
SELECT A, sum(1), sum(case when C <> 0 then 1 else 0 end) FROM Table GROUP BY A;
SELECT A, count(*), count(case when C <> 0 then A else null end) FROM Table GROUP BY A;

SQLite - Help to Make Complicated Query More Efficient

The data is Financial data, structured in buckets, where one bucket (Rollup) can contain other buckets of data. As Example structure:
Rollup1 | Dept1
Rollup1 | Rollup2 | Dept2
Rollup1 | Rollup2 | Dept3
Rollup1 | Rollup3 | Dept4
Rollup1 | Rollup3 | Rollup4 | Dept5
Rollup1 | Rollup3 | Rollup4 | Dept6
There are 8 Columns of this, with Rollups and Depts scattered throughout (but the leaves are always single Depts). Approx 10k rows.
The goal of the Query Result is to show a single column with ALL Rollups, with variable logic to present certain Rollups normally, and modifying all other Rollups.
For example, if my variable contained "Dept4", my result would be:
Rollup1
Rollup3
NA - Rollup2
NA - Rollup4
In the real scenario, there are 3 variables which determine the display of the Rollup column.
Here is what I have, which functions as it should, however, the performance is VERY bad. 1 Query takes up to 5 seconds, which I would like to improve.
SELECT DISTINCT CASE
WHEN "2" NOT IN
(
SELECT "2"
FROM "Finance New"
WHERE (#VAR3 = 'All' OR #VAR3 IN ("2","3","4","5","6","7","8","9"))
AND (#VAR4 = 'All' OR "10" = #VAR4)
AND (#VAR5 = 'All' OR "11" = #VAR5)
)
THEN
'Z N/A - ' || "2"
ELSE
"2"
END AS COL2
FROM "Finance New"
WHERE "5" <> 'All Applicable' AND "1" <> '9999'
AND "2" LIKE '9%'
UNION
SELECT DISTINCT CASE
WHEN "3" NOT IN
(
SELECT "3"
FROM "Finance New"
WHERE (#VAR3 = 'All' OR #VAR3 IN ("2","3","4","5","6","7","8","9"))
AND (#VAR4 = 'All' OR "10" = #VAR4)
AND (#VAR5 = 'All' OR "11" = #VAR5)
)
THEN
'Z N/A - ' || "3"
ELSE
"3"
END AS COL2
FROM "Finance New"
WHERE "5" <> 'All Applicable' AND "1" <> '9999'
AND "3" LIKE '9%'
UNION
Etc, for each of the columns in the Rollup/Dept Tree report.
The inner select in each union query appends to the text based on the variable criteria. Sorting is done automatically. The last line before UNION (AND "3" LIKE "9%") is to actually grab the Rollup. Rollups all start with 9.
Input parameters are labeled like #VARx.
I'm wondering if there is a more efficient way of performing this, assuming I cannot create a temp table and cannot change the structure of the data.
Thank you!
All these ORs prevent the use of indexes.
If at all possible, remove those #VAR = 'All' comparisons (or the other one) and create the SQL string dynamically, depending on the actual VAR3/4/5 values.
The LIKE prevents the use of indexes (because LIKE (or GLOB) would require TEXT affinity on the indexed column).
Replace this with normal comparisons, i.e., replace "col" LIKE '9%' with "col" >= '9' AND "col" < ':'.
The UNION already removes duplicates; drop the DISTINCTs.
Without indexes, all these queries do full table scans.
Create the following (covering) indexes:
CREATE INDEX i_10_11_all on "Finance New"("10","11", "2","3","4","5","6","7","8","9");
CREATE INDEX i_11_10_all on "Finance New"("11","10", "2","3","4","5","6","7","8","9");
CREATE INDEX i_2_1_5 on "Finance New"("2", "1","5");
CREATE INDEX i_3_1_5 on "Finance New"("3", "1","5");
-- and so on for 4..9

Resources