SELECT in SELECT - sqlite

I have the following query which I am trying to rewrite:
SELECT
max(dpHigh) AS High
FROM DailyPrices
WHERE dpTicker = 'DL.AS'
AND dpDate IN
(SELECT
dpDate
FROM DailyPrices
WHERE dpTicker ='DL.AS'
ORDER BY update DESC
LIMIT 10);
The query gives me the required result:
bash-3.2$ sqlite3 myData < Queries/high.sql
High
----------
4.67
bash-3.2$
Since next to the high value I wish to expand this query to also obtain a low value, earliest date, latest date, etc. For this reason, I am trying re-write an equivalent query using a select in select statement.
SELECT
(SELECT
max(dpHigh)
FROM DailyPrices
WHERE dpTicker = 'DL.AS'
AND dpDate IN
(SELECT dpDate
FROM DailyPrices
WHERE dpTicker ='DL.AS'
ORDER BY dpDate DESC
LIMIT 10)
)AS High
FROM DailyPrices
WHERE dpTicker = 'DL.AS';
Execution of the query spits output the expected value, however, it does exactly for the number of data entries of 'DL.AS'.
...
4.67
4.67
4.67
4.67
4.67
4.67
4.67
bash-3.2$
Since I am a SQLite newbie, I am probably overlooking the obvious. Does anybody have any suggestions?
BR
GAM

The outermost query looks like this:
SELECT (...)
FROM DailyPrices
WHERE dpTicker = 'DL.AS';
This will generate one output row for each table row with a matching dpTicker.
To generate a single row, regardless of how many rows might be found in some table, use a query without a FROM (the filtering and aggregation is already handled in the subqueries):
SELECT (...) AS High,
(...) AS Low;

Related

mariadb alternative to outer apply or lateral?

What I wanted was to use CROSS APPLY, but I guess that doesn't exist in mysql. The alternative I've read is LATERAL. Well, I'm using mariadb 10.3 and I guess that doesn't exist either. The ticket table contains an id that's referenced by the ticket_id column in the note table. A ticket can have many notes, I'm trying to list all tickets with their most recent note date (post_date). How could I write the query below for mariadb?
SELECT t.*, n.post_date
FROM ticket t,
LATERAL (
SELECT note.post_date FROM note WHERE t.id = note.ticket_id ORDER BY note.post_date DESC LIMIT 1
) n;
Example table structure:
Ticket
id
subject
1
stuff
2
more
note
id
post_date
ticket_id
1
1
2
1
3
2
4
1
5
2
I did find an open jira ticket from people asking for mariadb to support lateral.
From what I read, LATERAL will not be supported in MariaDB until version 11. But we can just as easily use ROW_NUMBER here, which is supported:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY ticket_id ORDER BY post_date DESC) rn
FROM note
)
SELECT t.*, n.post_date
FROM ticket t
INNER JOIN cte n
ON n.ticket_id = t.id
WHERE n.rn = 1;
If you wanted a close translation of your current lateral join, then use:
SELECT t.*,
(SELECT n.post_date
FROM note n
WHERE t.id = note.ticket_id
ORDER BY n.post_date DESC
LIMIT 1)
FROM ticket t;

Spool space error when inserting large result set to table

I have a SQL query in teradata that returns a results set of ~160m rows in (I guess) a reasonable time: dependent on how good a day the server is having it runs between 10-60 minutes.
I recently got access to space to save it as a table, however using my initial query and the "insert into " command I get error 2646-no more spool.
query structure is
insert into <test_DB.tablename>
with smaller_dataset as
(
select
*
from
(
select
items
,case items
from
<Database.table>
QUALIFY ROW_NUMBER() OVER (PARTITION BY A,B ORDER BY C desc , LAST_UPDATE_DTM DESC) = 1
where 1=1
and other things
) T --irrelevant alias for subquery
QUALIFY ROW_NUMBER() OVER (PARTITION BY A, B ORDER BY C desc) = 1)
, employee_table as
(
select
items
,max(J1.field1) J1_field1
,max(J2.field1) J2_field1
,max(J3.field1) J3_field1
,max(J4.field1) J4_field1
from smaller_dataset S
self joins J1,J2,J3,J4
group by
non-aggregate items
)
select
items
case items
from employee_table
;
How can I break up the return into smaller chunks to prevent this error?

SQLite: Summary data of a query result

I have the following query that provides me with the 10 most recent records in the database:
SELECT
dpDate AS Date,
dpOpen AS Open,
dpHigh AS High,
dpLow AS Low,
dpClose AS Close
FROM DailyPrices
WHERE dpTicker = 'DL.AS'
ORDER BY dpDate DESC
LIMIT 10;
The result of this query is as follows:
bash-3.2$ sqlite3 myData < Queries/dailyprice.sql
Date Open High Low Close
---------- ---------- ---------- ---------- ----------
2016-06-13 4.0 4.009 3.885 3.933
2016-06-10 4.23 4.236 4.05 4.08
2016-06-09 4.375 4.43 4.221 4.231
2016-06-08 4.406 4.474 4.322 4.35
2016-06-07 4.377 4.466 4.369 4.384
2016-06-06 4.327 4.437 4.321 4.353
2016-06-03 4.34 4.428 4.316 4.335
2016-06-02 4.434 4.51 4.403 4.446
2016-06-01 4.51 4.512 4.317 4.399
2016-05-31 4.613 4.67 4.502 4.526
bash-3.2$
Whilst I need to plot the extracted data, I also need to obtain the following summary data of the dataset:
Minimum date ==> 2016-05-31
Maximum date ==> 2016-06-13
Open value at minimum date ==> 4.613
Close value at maximum date ==> 3.933
Maximum of High column ==> 4.67
Minimum of Low column ==> 3.885
How can I, as newbie, approach this issue? Can this be done in one query?
Thanks for pointing me in the right direction.
Best regards,
GAM
The desired output can be achieved with
aggregate functions on a convenient common table expression,
which uses OPs expression verbatim
OPs method, with limit 1 applied to common table expression,
for getting mindate and maxdate among the ten days
Query:
WITH Ten(Date,Open,High,Low,Close) AS
(SELECT dpDate AS Date,
dpOpen AS Open,
dpHigh AS High,
dpLow AS Low,
dpClose AS Close
FROM DailyPrices
WHERE dpTicker = 'DL.AS'
ORDER BY dpDate DESC LIMIT 10)
SELECT min(Date) AS mindate,
max(Date) AS maxdate,
(SELECT Open FROM Ten ORDER BY Date ASC LIMIT 1) AS Open,
max(High) AS High,
min(Low) AS Low,
(SELECT Close FROM Ten ORDER BY Date DESC LIMIT 1) AS Close
FROM Ten;
Output (.headers on and .mode column):
mindate maxdate Open High Low Close
---------- ---------- ---------- ---------- ---------- ----------
2016-05-31 2016-06-13 4.613 4.67 3.885 3.933
Note:
I think the order of values in OPs last comment do not match the order of columns in the preceding comment by OP.
I chose the order from the preceding comment.
The order in the last comment seems to me to be "mindate, maxdate, Open, Close, High, Low".
Adapting my proposed query to that order would be simple.
Using SQLite 3.18.0 2017-03-28 18:48:43
Here is the .dump of my toy database, i.e. my MCVE, in case something is unclear. (I did not enter the many decimal places, it is probably a float rounding thing.)
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE dailyPrices (dpDate date, dpOpen float, dpHigh float, dpLow float, dpClose float, dpTicker varchar(10));
INSERT INTO dailyPrices(dpDate,dpOpen,dpHigh,dpLow,dpClose,dpTicker) VALUES('2016-06-13',4.0,4.009000000000000341,3.8849999999999997868,3.9329999999999998294,'DL.AS');
INSERT INTO dailyPrices(dpDate,dpOpen,dpHigh,dpLow,dpClose,dpTicker) VALUES('2016-06-10',4.2300000000000004263,4.2359999999999997655,4.0499999999999998223,4.080000000000000071,'DL.AS');
INSERT INTO dailyPrices(dpDate,dpOpen,dpHigh,dpLow,dpClose,dpTicker) VALUES('2016-06-09',4.375,4.4299999999999997157,4.2210000000000000852,4.2309999999999998721,'DL.AS');
INSERT INTO dailyPrices(dpDate,dpOpen,dpHigh,dpLow,dpClose,dpTicker) VALUES('2016-06-08',4.4059999999999996944,4.4740000000000001989,4.3220000000000000639,4.3499999999999996447,'DL.AS');
INSERT INTO dailyPrices(dpDate,dpOpen,dpHigh,dpLow,dpClose,dpTicker) VALUES('2016-06-07',4.3769999999999997797,4.4660000000000001918,4.3689999999999997726,4.384000000000000341,'DL.AS');
INSERT INTO dailyPrices(dpDate,dpOpen,dpHigh,dpLow,dpClose,dpTicker) VALUES('2016-06-06',4.3269999999999999573,4.4370000000000002771,4.3209999999999997299,4.3529999999999997584,'DL.AS');
INSERT INTO dailyPrices(dpDate,dpOpen,dpHigh,dpLow,dpClose,dpTicker) VALUES('2016-06-03',4.3399999999999998578,4.4370000000000002771,4.3209999999999997299,4.3529999999999997584,'DL.AS');
INSERT INTO dailyPrices(dpDate,dpOpen,dpHigh,dpLow,dpClose,dpTicker) VALUES('2016-06-02',4.4340000000000001634,4.5099999999999997868,4.4029999999999995807,4.4459999999999997299,'DL.AS');
INSERT INTO dailyPrices(dpDate,dpOpen,dpHigh,dpLow,dpClose,dpTicker) VALUES('2016-06-01',4.5099999999999997868,4.5119999999999995665,4.3170000000000001705,4.3990000000000000213,'DL.AS');
INSERT INTO dailyPrices(dpDate,dpOpen,dpHigh,dpLow,dpClose,dpTicker) VALUES('2016-05-31',4.6130000000000004334,4.6699999999999999289,4.5019999999999997797,4.525999999999999801,'DL.AS');
COMMIT;

Monthly Price data: Open, High, Low and Close

I have a table that is built up as follows:
dpTicker dpDate dpOpen dpHigh dpLow dpClose dpVolume dpAdjClose dpCreated dpModified
GLE.PA 2016-02-01 35.39 35.455 34.375 34.785 2951300 34.785 2016-02-06 13:33:40 2016-02-06 13:33:40
GLE.PA 2016-02-02 34.515 34.565 32.165 32.575 7353600 32.575 2016-02-06 13:33:40 2016-02-06 13:33:40
GLE.PA 2016-02-03 32.4 32.495 30.885 31.6 7007000 31.6 2016-02-06 13:33:40 2016-02-06 13:33:40
GLE.PA 2016-02-04 32.075 32.38 30.67 31.98 8181000 31.98 2016-02-06 13:33:40 2016-02-06 13:33:40
GLE.PA 2016-02-05 32.55 33.0 31.86 32.11 7056700 32.11 2016-02-06 13:33:40 2016-02-06 13:33:40
The data is daily share price information and the table contains hundreds of tickers (eg GLE.PA). Each ticker (eg GLE.PA) has an entry for each "Business Day".
My objective is to query Monthly Price Summaries from this Daily Price data table. The monthly data is constructed as follows:
Month Open: dpOpen at the first business day of the month;
Month high: max(dpHigh) of the month;
Month Low: min(dpLow) of the month;
Month Close: dpClose at the last business date of the month.
I manage to query the data for a specific month by using the following query in SQLite3:
SELECT
strftime ('%Y-%m', dpDate) AS month,
(SELECT dpOpen
FROM DailyPrices
WHERE dpTicker = 'GLE.PA'
AND dpDate =
(SELECT min(dpDate)
FROM DailyPrices
WHERE strftime('%Y%m', dpDate) = '201509'
)
) AS Open,
max(dpHigh) AS High,
min (dpLow) AS Low,
(SELECT dpClose
FROM DailyPrices
WHERE dpTicker = 'GLE.PA'
AND dpDate =
(SELECT max(dpDate)
FROM DailyPrices
WHERE strftime('%Y%m', dpDate) = '201509'
)
) AS Close
FROM DailyPrices
WHERE dpTicker ='GLE.PA'
AND strftime('%Y%m', dpDate) = '201509';
The output of the query is as follows:
bash-3.2$ sqlite3 myShares < month.sql
month Open High Low Close
---------- ---------- ---------- ---------- ----------
2015-09 42.72 44.07 37.25 39.85
bash-3.2$
With the following query I manage to generate a monthly overview for the High and Low:
SELECT
strftime('%Y-%m', dpDate) AS Month,
max(dpHigh) AS High,
min(dpLow) AS Low
FROM DailyPrices
WHERE dpTicker ='GLE.PA'
GROUP BY strftime('%Y%m', update);
A snapshot of the output looks as follows:
bash-3.2$ sqlite3 myShares < monthly.sql
Month High Low
---------- ---------- ----------
2000-01 219.32 184.346
2000-02 206.43 181.977
2000-03 210.411 181.503
2000-04 221.405 197.805
2000-05 226.239 55.9199
...
With the following query, I manage to extract the correct Open and by analogy the correct Close data:
SELECT
strftime('%Y-%m', dpDate) AS Month,
dpOpen AS Open
FROM DailyPrices
WHERE dpTicker = 'GLE.PA'
AND dpDate IN
(SELECT min(dpDate)
FROM DailyPrices
WHERE dpTicker = 'GLE.PA'
GROUP BY strftime('%Y%m', dpDate)
);
A snapshot of the output is as follows:
bash-3.2$ sqlite3 myShares < Open.sql
Month Open
---------- ----------
2000-01 218.846
2000-02 200.269
2000-03 206.525
2000-04 201.312
2000-05 215.908
...
I am struggling to combine the queries, month.sql and open.sql, into one query to obtain the following output:
Month Open High Low Close
------- ----- ----- ----- -----
2015-01 42.79 42.79 33.69 35.18
2015-02 35.39 35.46 26.61 32.42
2015-03 32.32 37.65 31.93 32.48
...
Any help to solving this question would be highly appreciate.
Best Regards
Gam
There are three places where the first query refers to the specific month searched for. Let's remove the two occurences in the subqueries; this requires using aliases so that we can refer to other instances of the same table by name:
SELECT
strftime ('%Y-%m', dpDate) AS month,
(SELECT dpOpen
FROM DailyPrices
WHERE dpTicker = 'GLE.PA'
AND dpDate =
(SELECT min(dpDate)
FROM DailyPrices AS DP2
WHERE strftime('%Y%m', DP2.dpDate) = strftime('%Y%m', DP1.dpDate)
)
) AS Open,
max(dpHigh) AS High,
min (dpLow) AS Low,
(SELECT dpClose
FROM DailyPrices
WHERE dpTicker = 'GLE.PA'
AND dpDate =
(SELECT max(dpDate)
FROM DailyPrices AS DP2
WHERE strftime('%Y%m', DP2.dpDate) = strftime('%Y%m', DP1.dpDate)
)
) AS Close
FROM DailyPrices AS DP1
WHERE dpTicker ='GLE.PA'
AND strftime('%Y%m', dpDate) = '201509';
Now that only the outermost query needs to know the month, we can simply replace the filter with GROUP BY:
SELECT
strftime ('%Y-%m', dpDate) AS month,
(...) AS Open,
max(dpHigh) AS High,
min (dpLow) AS Low,
(...) AS Close
FROM DailyPrices
WHERE dpTicker ='GLE.PA'
GROUP BY strftime('%Y%m', dpDate);
Please note that the open/close subquery can be simplified by using ORDER BY/LIMIT:
(SELECT dpOpen
FROM DailyPrices
WHERE dpTicker = 'GLE.PA'
AND ... dpDate ...
ORDER BY dpDate ASC
LIMIT 1) AS Open
Thanks CL!! This is good learning for a SQLite newbie, especially for the use of ALIASES.
Your suggestions produce generally good results, however, when running a couple of test runs I noticed occasional errors of the following style:
bash-3.2$ sqlite3 myShares < tst.sql
month Open High Low Close
---------- ---------- ---------- ---------- ----------
2006-03 75.4675 76.5852 70.4136 74.4956
2006-04 75.0787 75.9048 70.5108 72.7948
2006-05 77.0225 68.5184 70.7538
2006-06 70.5594 73.4751 64.7767 72.7462
2006-07 72.5518 75.2245 68.2269 74.0582
bash-3.2$
As you can notice, the Open price for the month of May 2006 is missing. I verified that the data actually exists:
bash-3.2$ sqlite3 myShares < test.sql
dpTicker dpDate dpOpen dpHigh dpLow dpClose dpVolume dpAdjClose dpCreated dpModified
---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ------------------- -------------------
BNP.PA 2006-04-26 72.0173 73.1835 72.0173 72.892 2623400 48.8861 2015-12-08 12:04:22 2015-12-08 12:04:22
BNP.PA 2006-04-27 73.8153 74.0096 72.5032 73.6209 6001400 49.375 2015-12-08 12:04:22 2015-12-08 12:04:22
BNP.PA 2006-04-28 73.4751 73.9611 72.6976 72.7948 4133300 48.8209 2015-12-08 12:04:22 2015-12-08 12:04:22
BNP.PA 2006-05-02 72.5518 73.5723 72.3574 73.2807 3085400 49.1468 2015-12-08 12:04:22 2015-12-08 12:04:22
BNP.PA 2006-05-03 73.8639 74.0096 72.5518 72.649 3290400 48.7231 2015-12-08 12:04:22 2015-12-08 12:04:22
BNP.PA 2006-05-04 72.892 73.5237 72.2602 73.3779 3640300 49.212 2015-12-08 12:04:22 2015-12-08 12:04:22
BNP.PA 2006-05-05 73.6209 74.8357 73.4751 74.7872 3255600 50.1572 2015-12-08 12:04:22 2015-12-08 12:04:22
bash-3.2$
For each of the tickers in the SQLite database, I have typically around 4 of those errors over a 16 year period.
As far as my SQLite newbie knowledge reaches, the data in the table is fine.
Any idea why occasionally, the query jumps a summary record? Generally, the descrepancy occurs on the Open Price, but from time to time also on the Close price.
Best regards,
GAM
For the record, the following is the query I have executed
SELECT
strftime ('%Y-%m', dpDate) AS month,
(SELECT dpOpen
FROM DailyPrices
WHERE dpTicker = 'BNP.PA'
AND dpDate =
(SELECT min(dpDate)
FROM DailyPrices AS DP2
WHERE strftime('%Y%m', DP2.dpDate) = strftime('%Y%m', DP1.dpDate)
)
ORDER BY dpDate ASC
LIMIT 1) AS Open,
max(dpHigh) AS High,
min (dpLow) AS Low,
(SELECT dpClose
FROM DailyPrices
WHERE dpTicker = 'BNP.PA'
AND dpDate =
(SELECT max(dpDate)
FROM DailyPrices AS DP2
WHERE strftime('%Y%m', DP2.dpDate) = strftime('%Y%m', DP1.dpDate)
)
) AS Close
FROM DailyPrices AS DP1
WHERE dpTicker ='BNP.PA'
AND strftime('%Y-%m', dpDate) > '2006-02'
AND strftime('%Y-%m', dpDate) < '2006-08'
GROUP BY strftime('%Y-%m', dpDate);

Retrieve a table to tallied numbers, best way

I have query that runs as part of a function which produces a one row table full of counts, and averages, and comma separated lists like this:
select
(select
count(*)
from vw_disp_details
where round = 2013
and rating = 1) applicants,
(select
count(*)
from vw_disp_details
where round = 2013
and rating = 1
and applied != 'yes') s_applicants,
(select
LISTAGG(discipline, ',')
WITHIN GROUP (ORDER BY discipline)
from (select discipline,
count(*) discipline_number
from vw_disp_details
where round = 2013
and rating = 1
group by discipline)) disciplines,
(select
LISTAGG(discipline_count, ',')
WITHIN GROUP (ORDER BY discipline)
from (select discipline,
count(*) discipline_count
from vw_disp_details
where round = 2013
and rating = 1
group by discipline)) disciplines_count,
(select
round(avg(util.getawardstocols(application_id,'1','AWARD_NAME')), 2)
from vw_disp_details
where round = 2013
and rating = 1) average_award_score,
(select
round(avg(age))
from vw_disp_details
where round = 2013
and rating = 1) average_age
from dual;
Except that instead of 6 main sub-queries there are 23.
This returns something like this (if it were a CSV):
applicants | s_applicants | disciplines | disciplines_count | average_award_score | average_age
107 | 67 | "speed,accuracy,strength" | 3 | 97 | 23
Now I am programmatically swapping out the "rating = 1" part of the where clauses for other expressions. They all work rather quickly except for the "rating = 1" one which takes about 90 seconds to run and that is because the rating column in the vw_disp_details view is itself compiled by a sub-query:
(SELECT score
FROM read r,
eval_criteria_lookup ecl
WHERE r.criteria_id = ecl.criteria_id
AND r.application_id = a.lgo_application_id
AND criteria_description = 'Overall Score'
AND type = 'ABC'
) reader_rank
So when the function runs this extra query seems to slow everything down dramatically.
My question is, is there a better (more efficient) way to run a query like this that is basically just a series of counts and averages, and how can I refactor to optimize the speed so that the rating = 1 query doesn't take 90 seconds to run.
You could choose to MATERIALIZE the vw_disp_details VIEW. That would pre-calculate the value of the rating column. There are various options for how up-to-date a materialized view is kept, you would probably want to use the ON COMMIT clause so that vw_disp_details is always correct.
Have a look at the official documentation and see if that would work for you.
http://docs.oracle.com/cd/B28359_01/server.111/b28286/statements_6002.htm
Do all most of your queries in only one. Instead of doing:
select
(select (count(*) from my_tab) as count_all,
(select avg(age) from my_tab) as avg_age,
(select avg(mypkg.get_award(application_id) from my_tab) as_avg-app_id
from dual;
Just do:
select count(*), avg(age),avg(mypkg.get_award(application_id)) from my_tab;
And then, maybe you can do some union all for the other results. But this step all by itself should help.
I was able to solve this issue by doing two things: creating a new view that displayed only the results I needed, which gave me marginal gains in speed, and in that view moving the where clause of the sub-query that caused the lag into the where clause of the view and tacking on the result of the sub-query as column in the view. This still returns the same results thanks to the fact that there are always going to be records in the table the sub-query accessed for each row of the view query.
SELECT
a.application_id,
util.getstatus (a.application_id) status,
(SELECT score
FROM applicant_read ar,
eval_criteria_lookup ecl
WHERE ar.criteria_id = ecl.criteria_id
AND ar.application_id = a.application_id
AND criteria_description = 'Overall Score' //THESE TWO FIELDS
AND type = 'ABC' //ARE CRITERIA_ID = 15
) score
as.test_total test_total
FROM application a,
applicant_scores as
WHERE a.application_id = as.application_id(+);
Became
SELECT
a.application_id,
util.getstatus (a.application_id) status,
ar.score,
as.test_total test_total
FROM application a,
applicant_scores as,
applicant_read ar
WHERE a.application_id = as.application_id(+)
AND ar.application_id = a.application_id(+)
AND ar.criteria_id = 15;

Resources