I have a SQLite-table with columns author, book, release_date and seller.
Now I need to find a way to get only the top 10 books of each author with the latest release_date.
If possible, I also need the seller, that appears "most often" in the top 10 latest released books per author. The result should be a simple table with author and seller only.
This problem is really driving me crazy.
Is at minimum one part possible in a single SQLite-query???
SQLite v3.25 which came out in September 2018 added window functions.
You can calculate the rank of a book by date for each author with :
CREATE TABLE books
(
author varchar(10),
title varchar(10),
release date
);
INSERT INTO books VALUES
('aaa','ta1','2018-01-01'),
('aaa','ta2','2018-02-01'),
('aaa','ta3','2018-03-01'),
('aaa','ta4','2018-05-01'),
('bbb','tb1','2018-05-01'),
('bbb','tb2','2018-06-01')
;
SELECT
author,
title,
release,
row_number() OVER (partition by author ORDER BY release desc) AS row_number
FROM books
The function row_number() OVER (partition by author ORDER BY release desc) AS row_number calculates the row number for each author if the rows are ordered by release date.
This produces :
author title release row_number
aaa ta4 2018-05-01 1
aaa ta3 2018-03-01 2
aaa ta2 2018-02-01 3
aaa ta1 2018-01-01 4
bbb tb2 2018-06-01 1
bbb tb1 2018-05-01 2
Once you have the row number, you can filter the top N items with a simple WHERE row_number <= N, eg for the last 2 books per author :
select * from (
SELECT
author,
title,
release,
row_number() OVER (partition by author ORDER BY release desc) AS row_number
FROM books )
where row_number<=2
This returns :
author title release row_number
aaa ta4 2018-05-01 1
aaa ta3 2018-03-01 2
bbb tb2 2018-06-01 1
bbb tb1 2018-05-01 2
Related
I have a table with data like below:
Log Table:
User Id
Login Date
1
2022-01-03
1
2022-01-04
1
2022-01-10
1
2022-01-11
1
2022-01-12
1
2022-01-23
1
2022-01-25
1
2022-01-26
1
2022-01-27
1
2022-01-28
What I'm trying to do is to create a query that return rows of the latest logins by consecutive dates with var_date as parameter.
If var_date is 2022-01-29, then the result is:
User Id
Login Date
1
2022-01-25
1
2022-01-26
1
2022-01-27
1
2022-01-28
If var_date is 2022-01-30, then no result is returned, since 2022-01-29 is not in the table.
If var_date is 2022-01-24, then the query will return row with 2022-01-23 as login date.
How am I to do this in SQLite?
Thank you.
This question is a variant of gaps and islands, with the islands being clusters of records per user with continuous dates. Here is one approach using analytic functions:
WITH cte AS (
SELECT *, CASE WHEN julianday(LoginDate) -
julianday(LAG(LoginDate) OVER (PARTITION BY UserID
ORDER BY LoginDate))
> 1 THEN 1 ELSE 0 END AS counter
FROM yourTable
),
cte2 AS (
SELECT *, SUM(counter) OVER (PARTITION BY UserID ORDER BY LoginDate) AS grp
FROM cte
)
SELECT UserID, LoginDate
FROM cte2 t1
WHERE LoginDate < '2022-01-29' AND
grp = (SELECT t2.grp FROM cte2 t2
WHERE t2.UserID = t1.UserID AND t2.LoginDate = '2022-01-28');
Demo
The two CTEs generate a pseudo date group for each cluster per user. The final query returns all records less than the target date for which the group value is the same as the immediately preceding date. Hence, for dates having no immediate record for a given user, the query will return empty set.
Use a recursive CTE:
WITH cte(UserId, LoginDate) AS (
SELECT :var_user_id, :var_date
UNION ALL
SELECT UserId, date(c.LoginDate, '-1 day')
FROM cte c
WHERE EXISTS (SELECT 1 FROM tablename t WHERE t.UserId = c.UserId AND t.LoginDate = date(c.LoginDate, '-1 day'))
)
SELECT *
FROM cte
WHERE LoginDate < (SELECT MAX(LoginDate) FROM cte);
Change :var_user_id and :var_date to the values that you want for the user's id and the date.
See the demo.
I have created a dataset "Orders" to test sqlite with structure
CREATE TABLE Orders (
OrderID INTEGER PRIMARY KEY AUTOINCREMENT
OrderDate TIMESTAMP DEFAULT (CURRENT_TIMESTAMP)
CustomerID VARCHAR(20)
OrderValue DECIMAL (8, 3) NOT NULL
);
I filled the table with sample data
ID Date Customer Value($)
6 11-09-2019 Eva 6946.3
7 11-10-2019 John 850.6
8 11-11-2019 Helen 9855.0
9 11-12-2019 Maria 765.2
11 11-13-2019 Gui 1879.5 --< I removed ID 10 purposely
12 11-14-2019 Eric 600.0
13 11-15-2019 Paul 12890.1
How could I identify in same row both records 11 and 9, given the parameter :date, to represent the last sale of orderdate = :date and the immediately forward, or in case I changed record 9 to same date of 11, I get 8 (the last sale of last day)?
pseudo-code
select last 2 order where orderdate <= :date inner join (? a relation to put both in same row)
Step one is to replace your 'MM-DD-YYYY' date strings with ones that can be sorted - 'YYYY-MM-DD', for example (Then you can use the date and time functions on them as well if needed). Since your orderdate column has a default value of CURRENT_TIMESTAMP, but you're just showing the date and that not in the same format that uses, I assume you're inserting your dates manually instead of letting them be automatically generated on insert? The column names of your sample data table don't match up with the ones in your table definition either... that's confusing.
Anyways, since you said you want the values in the same row, the lead() window function comes into play (Requires Sqlite 3.25 or newer). Something like:
WITH cte AS
(SELECT orderid, orderdate, customerid, ordervalue
, lead(orderid, 1) OVER bydate AS next_id
, lead(orderdate, 1) OVER bydate AS next_date
, lead(customerid, 1) OVER bydate AS next_customer
, lead(ordervalue, 1) OVER bydate AS next_value
FROM orders
WINDOW bydate AS (ORDER BY orderdate))
SELECT * FROM cte WHERE orderdate = :date;
gives for a :date of '2019-11-12':
orderid orderdate customerid ordervalue next_id next_date next_customer next_value
---------- ---------- ---------- ---------- ---------- ---------- ------------- ----------
9 2019-11-12 Maria 765.2 11 2019-11-13 Gui 1879.5
I am attempting to find the top n records when grouped by multiple attributes. I believe it is related to this problem, but I am having difficulty adapting the solution described to my situation.
To simplify, I have a table with columns (did is short for device_id):
id int
did int
dateVal dateTime
I am trying to find the top n device_id's for each day with the most rows.
For example (ignoring id and the time part of dateTime),
did dateVal
1 2017-01-01
1 2017-01-01
1 2017-01-01
2 2017-01-01
3 2017-01-01
3 2017-01-01
1 2017-01-02
1 2017-01-02
2 2017-01-02
2 2017-01-02
2 2017-01-02
3 2017-01-02
Finding the top 2 would yield...
1, 2017-01-01
3, 2017-01-01
2, 2017-01-02
1, 2017-01-02
My current naive approach is only giving me the top 2 across all dates.
--Using SQLite
select date(dateVal) || did
from data
group by date(dateVal), did
order by count(*) desc
limit 2
I'm using the concatenation operator so that I can later extract the rows.
I am using SQLite, but any general SQL explanation would be appreciated.
Similarly to this question, define a CTE that computes all device counts for your desired groups, then use it in a WHERE ... IN subquery, limited to the top 2 devices for that date:
WITH device_counts AS (
SELECT did, date(dateval) AS dateval, COUNT(*) AS device_count
FROM data
GROUP BY did, date(dateval)
)
SELECT did, date(dateval) FROM device_counts DC_outer
WHERE did IN (
SELECT did
FROM device_counts DC_inner
WHERE DC_inner.dateval = DC_outer.dateval
GROUP BY did, date(dateval)
ORDER BY DC_inner.device_count DESC LIMIT 2
)
ORDER BY date(dateval), did
I tested the query using sql server
select top 2 did, dateVal
from (select *, count(*) as c
from test
group by did,dateVal) as t
order by t.c desc
I am not an expert in Teradata or SQL so need some help in counting number of days a person has attended a customer
If a sales person has attended customer from 1 - 3 days consecutively this will be counted as 1 and if the person has attended 4 days then it will be counted as 2
I will add the example of data and result I want
DATA:
Sales Person Date
John 1/03/2016
John 2/03/2016
John 3/03/2016
John 4/03/2016
John 5/03/2016
David 6/03/2016
David 7/03/2016
David 8/03/2016
David 9/03/2016
David 10/03/2016
David 11/03/2016
John 12/03/2016
John 13/03/2016
John 14/03/2016
John 15/03/2016
John 16/03/2016
John 17/03/2016
John 18/03/2016
John 19/03/2016
David 20/03/2016
Sue 21/03/2016
Sue 22/03/2016
Sue 23/03/2016
Lily 24/03/2016
Lily 25/03/2016
Lily 26/03/2016
Sue 27/03/2016
David 28/03/2016
John 29/03/2016
David 30/03/2016
John 31/03/2016
RESULT WANTED:
Sales Person Groups
John 6
David 4
Sue 2
Lily 1
Excel Format Picture
Interesting problem.
Here is a solution using ordered analytical functions and nested derived tables.
The final number of points per person is in person_points. I used analytical function sum() instead of grouping because I wanted to show the intermediate steps. The rule that days in a 3-day period that overlap a previous group should not be counted was a little tricky to implement.
create table t ( person varchar(30), dt date);
insert into t values('John','2016-03-01');
insert into t values('John','2016-03-02');
insert into t values('John','2016-03-03');
insert into t values('John','2016-03-04');
insert into t values('John','2016-03-05');
insert into t values('David','2016-03-06');
insert into t values('David','2016-03-07');
insert into t values('David','2016-03-08');
insert into t values('David','2016-03-09');
insert into t values('David','2016-03-10');
insert into t values('David','2016-03-11');
insert into t values('John','2016-03-12');
insert into t values('John','2016-03-13');
insert into t values('John','2016-03-14');
insert into t values('John','2016-03-15');
insert into t values('John','2016-03-16');
insert into t values('John','2016-03-17');
insert into t values('John','2016-03-18');
insert into t values('John','2016-03-19');
insert into t values('David','2016-03-20');
insert into t values('Sue','2016-03-21');
insert into t values('Sue','2016-03-22');
insert into t values('Sue','2016-03-23');
insert into t values('Lily','2016-03-24');
insert into t values('Lily','2016-03-25');
insert into t values('Lily','2016-03-26');
insert into t values('Sue','2016-03-27');
insert into t values('David','2016-03-28');
insert into t values('John','2016-03-29');
insert into t values('David','2016-03-30');
insert into t values('John','2016-03-31');
select t_points.*
,sum(points) over(partition by person) person_points
from
(
select person, consecutive_group, min(dt) first_dt, max(dt) last_dt
, last_dt - first_dt + 1 n_days
,floor((n_days + 2) / 3)*3 + first_dt - 1 end_of_3day_period
,max(end_of_3day_period) over(partition by person order by consecutive_group rows between 1 preceding and 1 preceding) prev_end_3day_dt
,case when prev_end_3day_dt >= first_dt then prev_end_3day_dt - first_dt + 1 else 0 end overlapped_days
,n_days - overlapped_days n_days_no_overlap
, floor((n_days_no_overlap + 2)/3) points
from
(
select person,dt
,sum(begin_new_consecutive) over(partition by person order by dt rows unbounded preceding) consecutive_group
from
(
select person, dt
,max(dt) over(partition by person order by dt rows between 1 preceding and 1 preceding) prev_dt
,case when dt = prev_dt+1 then 0 else 1 end begin_new_consecutive
from t
) t_consecutive
) t_consecutive_group
group by 1,2
) t_points
order by 1,2 ;
I have this query that has no problem:
SELECT m.movie_name, cd.times_requested
FROM movie m,
(select *
from(
select movie_id, count(movie_id) as times_requested
from movie_queue
where status_id=0 or status_id=1
group by movie_id
) ab
where times_requested>1) cd
WHERE m.movie_id=cd.movie_id;
It returns the following list.
MOVIE_NAME TIMES_REQUESTED
----------------------------------------------------------------------
E.T. the Extra-Terrestrial 2
Indiana Jones and the Kingdom of the Crystal Skull 2
War of the Worlds 3
Unbreakable 3
Question:
How do I add another column showing the amount of DVDs available for each movie, I can not just join it because I need to group DVDs by movie_id first?.
Table above is OK, but I want to add a third column, the third column will contain information about the number of DVDs available for each movie. the problem is that the number of dvds is stored in another table call DVDS. The structure of the table DVDs is similar to this:
DVD_ID MOVIE_ID DVD_ENTRY_DATE
---------- ---------- --------------
1 1 24-JUL-12
2 1 24-JUL-12
3 1 24-JUL-12
4 2 24-JUL-12
5 2 24-JUL-12
Desired Result:
Final table should look similar to the one below:
MOVIE_NAME TIMES_REQUESTED DVDS_AVAILABLE
-------------------------------------------------------------------
E.T. the Extra-Terrestrial 2 3
Indiana Jones and the Kingdom 2 1
War of the Worlds 3 3
Unbreakable 3 1
I tried the following code, but did not get the result I wanted
I am assuming I need to go to dvd table first and find all dvds that match the movie_id I want and group them by movie_id. I tried the code below, but instead of returning the 4 rows I want it is returning 72.
SELECT m.movie_name, pomid.times_requested, d.dvds_available
FROM movie m,
(select *
from(
select movie_id, count(movie_id) as times_requested
from movie_queue
where status_id=0 or status_id=1
group by movie_id
) mid
where times_requested>1) pomid,
(select movie_id, count(movie_id) as dvds_available
from dvd
group by movie_id) d
WHERE m.movie_id=pomid.movie_id;
Thanks for your suggestions in how to fix this.
A join to d.movie_id is missing, that's why you get too many rows. (Quick check: how many tables do I have? How many joins do I have?)
And I'd also add an outer join to get all movies, even when there are no dvd or movie_queue entries.
SELECT m.movie_name
,NVL(pomid.times_requested,0) times_requested
,NVL(d.dvds_available,0) dvds_available
FROM movie m
,(SELECT *
FROM (SELECT movie_id
,COUNT (movie_id) AS times_requested
FROM movie_queue
WHERE status_id = 0
OR status_id = 1
GROUP BY movie_id) mid
WHERE times_requested > 1) pomid
,(SELECT movie_id
,COUNT (movie_id) AS dvds_available
FROM dvd
GROUP BY movie_id) d
WHERE m.movie_id = pomid.movie_id(+)
AND d.movie_id(+) = pomid.movie_id;
http://www.sqlfiddle.com/#!4/5437a/11
Not sure I really understand your table structure, but this might get you started:
select movie_id,
sum(case when status_id in (0,1) then 1 else 0 end) as times_requested,
count(movie_id) as dvd_available,
from movie_queue
group by movie_id;