Join two tables in SQLite and Count - sqlite

I have two tables named "likes" and "comments" and I want to have a table which has counts of likes and comments for each specific user, I wrote following query in SQLite but result is not true for all users, count values for users in both tables are multiple of number of likes and number of comments.
SELECT
likes.liker_name, likes.liker_id, likes.profile_picture ,
COUNT(comments.commenter_name) AS comment_count, COUNT( likes.liker_id) AS like_count
FROM likes
LEFT JOIN comments
ON likes.liker_name = comments.commenter_name
GROUP BY
likes.liker_name
ORDER BY
COUNT( likes.liker_id) DESC
How can I get correct value of count for users that exist in both tables?

The problem is: Some users have comments but no likes, others have likes but no comments, some have both and some have none. Therefore I suggest using a union query and summing that one again
SELECT
u.name, u.id, u.profile_picture,
SUM(u.like_count) AS like_count, SUM(u.comment_count) AS comment_count
FROM (
SELECT
liker_name AS name, liker_id AS id, profile_picture,
COUNT(*) AS like_count, 0 AS comment_count
FROM
likes
GROUP BY
liker_name, liker_id, profile_picture
UNION ALL
SELECT
commenter_name AS name, commenter_id AS id, profile_picture,
0 AS like_count, COUNT(*) AS comment_count
FROM
comments
GROUP BY
commenter_name, commenter_id, profile_picture
) AS u
GROUP BY
u.name, u.id, u.profile_picture
If you have a separate user table you could also left join the likes count and the comments count subqueries to the user table
SELECT
u.name, u.id, u.profile_picture, l.cnt AS like_count, c.cnt AS comment_count
FROM
users u
LEFT JOIN
(SELECT liker_id, COUNT(*) AS cnt
FROM likes
GROUP BY liker_id
) AS l
ON u.user_id = l.liker_id
LEFT JOIN
(SELECT commenter_id, COUNT(*) AS cnt
FROM comments
GROUP BY commenter_id
) AS c
ON u.user_id = c.commenter_id
WHERE l.cnt > 0 OR c.cnt > 0
No matter how you make it, you must count the comments and the likes in separate subqueries. If you count after joining you are summing on a result where records might be duplicated (the ones on the left side) and you are getting the wrong count.

Related

How do I include all max values within a row?

I'm very new to learning SQL, I apologize if my question isn't completely accurate.
The question I'm trying to answer with this query is "What is the most popular music genre in each country?" I've had to use a subquery and it works, but I found that for a few countries in the table, more than one genre has the MAX value. I'm stuck with how to edit my query so that all genres with the max value show in the results. Here is my code, using DB Browser for SQLite:
SELECT BillingCountry AS Country , name AS Genre , MAX(genre_count) AS Purchases
FROM (
SELECT i.BillingCountry, g.name, COUNT(g.genreid) AS genre_count
FROM Invoice i
JOIN InvoiceLine il
ON il.InvoiceId = i.InvoiceId
JOIN TRACK t
ON il.trackid = t.TrackId
JOIN Genre g
ON t.genreid = g.GenreId
GROUP BY 1,2
) sub
GROUP BY 1
Here is an example of the result:
| Country | Genre |Purchase|
|---------|-------|--------|
|Agrentina| Punk | 9 |
|Australia| Rock | 22 |
BUT in running just the subquery to COUNT the purchases, Argentina has two Genres with 9 Purchases (the max number for that country). How do I adjust my query to include both and not just the first one in the row?
You can do it with RANK() window function:
SELECT BillingCountry, name, genre_count
FROM (
SELECT i.BillingCountry, g.name, COUNT(*) AS genre_count,
RANK() OVER (PARTITION BY i.BillingCountry ORDER BY COUNT(*) DESC) rnk
FROM Invoice i
INNER JOIN InvoiceLine il ON il.InvoiceId = i.InvoiceId
INNER JOIN TRACK t ON il.trackid = t.TrackId
INNER JOIN Genre g ON t.genreid = g.GenreId
GROUP BY i.BillingCountry, g.name
)
WHERE rnk = 1
This will return the ties in separate rows.
If you want 1 row for each country, you could also use GROUP_CONCAT():
SELECT BillingCountry, GROUP_CONCAT(name) AS name, MAX(genre_count) AS genre_count
FROM (
SELECT i.BillingCountry, g.name, COUNT(*) AS genre_count,
RANK() OVER (PARTITION BY i.BillingCountry ORDER BY COUNT(*) DESC) rnk
FROM Invoice i
INNER JOIN InvoiceLine il ON il.InvoiceId = i.InvoiceId
INNER JOIN TRACK t ON il.trackid = t.TrackId
INNER JOIN Genre g ON t.genreid = g.GenreId
GROUP BY i.BillingCountry, g.name
)
WHERE rnk = 1
GROUP BY BillingCountry

Getting a min(date) AND max(date) AND their respective titles

I have three tables that I would like to select from
Table 1 has a bunch of static information about a user like their idnumber, name, registration date
Table 2 has the idnumber of the user, course number, and the date they registered for the course
Table 3 has the course number, and the title of the course
I am trying to use one query that will select the columns mentioned in table 1, with the most recent course they registered (name and date registered) as well as their first course registered (name and date registered)
Here is what I came up with
SELECT u.idst, u.userid, u.firstname, u.lastname, u.email, u.register_date,
MIN(l.date_inscr) as mindate, MAX(l.date_inscr) as maxdate, lc.coursename
FROM table1 u,table3 lc
LEFT JOIN table2 l
ON l.idCourse = lc.idCourse
WHERE u.idst = 12787
AND u.idst = l.idUser
And this gives me everything i need, and the dates are correct but I have no idea how to display BOTH of the names of courses. The most recent and the first.
And help would be great.
Thanks!!!
You can get your desired results by generating the min/max date_inscr for each user in a derived table and then joining that twice to table2 and table3, once to get each course name:
SELECT u.idst, u.userid, u.firstname, u.lastname, u.email, u.register_date,
l.mindate, lc1.coursename as first_course,
l.maxdate, lc2.coursename as latest_course
FROM table1 u
LEFT JOIN (SELECT idUser, MIN(date_inscr) AS mindate, MAX(date_inscr) AS maxdate
FROM table2
WHERE idUser = 12787
) l ON l.idUser = u.idst
LEFT JOIN table2 l1 ON l1.idUser = l.idUser AND l1.date_inscr = l.mindate
LEFT JOIN table3 lc1 ON lc1.idCourse = l1.idCourse
LEFT JOIN table2 l2 ON l2.idUser = l.idUser AND l2.date_inscr = l.maxdate
LEFT JOIN table3 lc2 ON lc2.idCourse = l2.idCourse
As #BillKarwin pointed out, this is more easily done using two separate queries.

Pull data from two separate SQL tables based on a JOIN but that uses a column not in the JOIN or the SELECT

This is based on a Khan Academy course. I have 2 SQLite tables:
CREATE TABLE table1 (id STRING PRIMARY KEY, charge_id TEXT, amount INTEGER, currency INTEGER, country STRING);
INSERT INTO table1
( id, charge_id, amount, currency, country) VALUES
('0xb01', '0x1', 2000, 'USD', 'USA'),
('0x0a1', '0x1', 500, 'USD', 'USA'),
('0x0c1', '0x1', 1000, 'CAD', 'USA'),
('0xs31', '0x4', 1000, 'YEN', 'CA');
CREATE TABLE table2 (id STRING PRIMARY KEY, charge_id TEXT, value VARIABLE);
INSERT INTO table2
( id, charge_id, value ) VALUES
('0x34s', '0x1', '123 main street'),
('0x3ze', '0x1', 'merchant-id-001'),
('0x3w2', '0x2', 'zip-code-90210' ),
('0x35k', '0x2', 'merchant-id-002');
I would SELECT the amount, currency and country from table 1 (Charges) and join with table 2 (Metadata) based on the id. Charges uses ID, while Metadata stores meta tags, with a unique identifier [id] equal to the charge [id] from Charges. I want to group the total amount, total currency for each merchant_id and only those charges that were made in the USA.
Step-by-step pseudo code:
(1) find all charges in the USA (Charges country)
(2) match all charge_ids from Charges (id) to charges in Metadata (id)
(3) separate each charge by the merchant_id (Metadata value)
(4) display the total amount, currency by merchant_id (amount, Charges currency, value)
This is a difficult because :
(1) I want to select from Charges and
(2) join to Metadata by the [id]
(3) but each Metadata record only has the charge_id and a metadata tag, which would match the merchant_id with the charge
The query result I would like is:
value (merchant id) currency total amount
merchant-id-001 usd 2500
merchant-id-001 cad 1000
merchant-id-002 yen 200
merchant-id-002 cad 50
Currently I have this query but it does not seem to be working:
select table1.amount, table1.currency, table1.country, count(*)
from table1
LEFT JOIN table1
UNION ALL
SELECT table2.value
FROM CHARGES_table2
LEFT JOIN table2
ON table1.id = table2.id
WHERE table1.country = 'USA'
GROUP BY table2.value
I am getting errors on union parameters: 2,1
Read the grammar & other documentation for the expressions you are using. The arguments to UNION are two SELECTs & it can have a final ORDER BY. Here's the parse:
select table1.amount, table1.currency, table1.country, count(*)
from table1
LEFT JOIN table1
UNION ALL
SELECT table2.value
FROM CHARGES_table2
LEFT JOIN table2
ON table1.id = table2.id
WHERE table1.country = 'USA'
GROUP BY table2.value
UNION is putting its arguments' rows into one table so it also requires that their columns agree in number & have compatible types. Here the numbers disagree.
There is no table1 in scope in the second SELECT so that is an error in isolation that is moot given the UNION.

SQLite Nested Query for maximum

I'm trying to use DB Browser for SQLite to construct a nested query to determine the SECOND highest priced item purchased by the top 10 spenders. The query I have to pick out the top 10 spenders is:
SELECT user_id, max(item_total), SUM (item_total + shipping_cost -
discounts_applied) AS total_spent
FROM orders AS o
WHERE payment_reject = "FALSE"
GROUP BY user_id
ORDER BY total_spent DESC
LIMIT 10
This gives the user_id, most expensive item they purchased (not counting shipping or discounts) as well as the total amount they spent on the site.
I was trying to use a nested query to generate a list of the second most expensive items they purchased, but keep getting errors. I've tried
SELECT user_id, MAX(item_total) AS second_highest
FROM orders
WHERE item_total < (SELECT user_id, SUM (item_total + shipping_cost -
discounts_applied) AS total_spent
FROM orders
WHERE payment_reject = "FALSE"
GROUP BY user_id
ORDER BY total_spent DESC
LIMIT 10)
group by user_id
I keep getting a row value misused error. Does anyone have pointers on this nested query or know of another way to find the second highest item purchased from within the group found in the first query?
Thanks!
(Note: The following assumes you're using Sqlite 3.25 or newer since it uses window functions).
This will return the second-largest item_total for each user_id without duplicates:
WITH ranked AS
(SELECT DISTINCT user_id, item_total
, dense_rank() OVER (PARTITION BY user_id ORDER BY item_total DESC) AS ranking
FROM orders)
SELECT user_id, item_total FROM ranked WHERE ranking = 2;
You can combine it with your original query with something like:
WITH ranked AS
(SELECT DISTINCT user_id, item_total
, dense_rank() OVER (PARTITION BY user_id ORDER BY item_total DESC) AS ranking
FROM orders),
totals AS
(SELECT user_id
, sum (item_total + shipping_cost - discounts_applied) AS total_spent
FROM orders
WHERE payment_reject = 0
GROUP BY user_id)
SELECT t.user_id, r.item_total, t.total_spent
FROM totals AS t
JOIN ranked AS r ON t.user_id = r.user_id
WHERE r.ranking = 2
ORDER BY t.total_spent DESC, t.user_id
LIMIT 10;
Okay, after fixing your table definition to better reflect the values being stored in it and the stated problem, and fixing the data and adding to it so you can actually get results, plus an optional but useful index like so:
CREATE TABLE orders (order_id INTEGER PRIMARY KEY
, user_id INTEGER
, item_total REAL
, shipping_cost NUMERIC
, discounts_applied NUMERIC
, payment_reject INTEGER);
INSERT INTO orders(user_id, item_total, shipping_cost, discounts_applied
, payment_reject) VALUES (9852,60.69,10,0,FALSE),
(2784,123.91,15,0,FALSE), (1619,119.75,15,0,FALSE), (9725,151.92,15,0,FALSE),
(8892,153.27,15,0,FALSE), (7105,156.86,25,0,FALSE), (4345,136.09,15,0,FALSE),
(7779,134.93,15,0,FALSE), (3874,157.27,15,0,FALSE), (5102,108.3,10,0,FALSE),
(3098,59.97,10,0,FALSE), (6584,124.92,15,0,FALSE), (5136,111.06,10,0,FALSE),
(1869,113.44,20,0,FALSE), (3830,129.63,15,0,FALSE), (9852,70.69,10,0,FALSE),
(2784,134.91,15,0,FALSE), (1619,129.75,15,0,FALSE), (9725,161.92,15,0,FALSE),
(8892,163.27,15,0,FALSE), (7105,166.86,25,0,FALSE), (4345,146.09,15,0,FALSE),
(7779,144.93,15,0,FALSE), (3874,167.27,15,0,FALSE), (5102,118.3,10,0,FALSE),
(3098,69.97,10,0,FALSE), (6584,134.92,15,0,FALSE), (5136,121.06,10,0,FALSE),
(1869,123.44,20,0,FALSE), (3830,139.63,15,0,FALSE);
CREATE INDEX orders_idx_1 ON orders(user_id, item_total DESC);
the above query will give:
user_id item_total total_spent
---------- ---------- -----------
7105 156.86 373.72
3874 157.27 354.54
8892 153.27 346.54
9725 151.92 343.84
4345 136.09 312.18
7779 134.93 309.86
3830 129.63 299.26
6584 124.92 289.84
2784 123.91 288.82
1619 119.75 279.5
(If you get a syntax error from the query now, it's because you're using an old version of sqlite that doesn't support window functions.)

Time Difference between query result rows in SQLite: How To?

Consider the following reviews table contents:
CustomerName ReviewDT
Doe,John 2011-06-20 10:13:24
Doe,John 2011-06-20 10:54:45
Doe,John 2011-06-20 11:36:34
Doe,Janie 2011-06-20 05:15:12
The results are ordered by ReviewDT and grouped by CustomerName, such as:
SELECT
CustomerName,
ReviewDT
FROM
Reviews
WHERE
CustomerName NOT NULL
ORDER BY CustomerName ASC, ReviewDT ASC;
I'd like to create a column of the time difference between each row of this query for each Customer... rowid gives the original row, and there is no pattern to the inclusion from the rowid etc...
For the 1st entry for a CustomerName, the value would be 0. I am asking here incase this is something that can be calculated as part of the original query somehow. If not, I was planning to do this by a series of queries - initially creating a new TABLE selecting the results of the query above - then ALTERING to add the new column and using UPDATE/strftime to get the time differences by using rowid-1 (somehow)...
To compute the seconds elapsed from one ReviewDT row to the next:
SELECT q.CustomerName, q.ReviewDT,
strftime('%s',q.ReviewDT)
- strftime('%s',coalesce((select r.ReviewDT from Reviews as r
where r.CustomerName = q.CustomerName
and r.ReviewDT < q.ReviewDT
order by r.ReviewDT DESC limit 1),
q.ReviewDT))
FROM Reviews as q WHERE q.CustomerName NOT NULL
ORDER BY q.CustomerName ASC, q.ReviewDT ASC;
To get the DT of each ReviewDT and its preceding CustomerName row:
SELECT q.CustomerName, q.ReviewDT,
coalesce((select r.ReviewDT from Reviews as r
where r.CustomerName = q.CustomerName
and r.ReviewDT < q.ReviewDT
order by r.ReviewDT DESC limit 1),
q.ReviewDT)
FROM Reviews as q WHERE q.CustomerName NOT NULL
ORDER BY q.CustomerName ASC, q.ReviewDT ASC;

Resources