Query returning incorrect count - count

I have the following query.
SELECT a.link_field1 AS journo, count(a.link_id) as articles, AVG( b.vote_value ) AS score FROM dan_links a LEFT JOIN dan_votes b ON link_id = vote_link_id WHERE link_field1 <> '' and link_status NOT IN ('discard', 'spam', 'page') GROUP BY link_field1 ORDER BY link_field1, link_id
This query is returning a count of 3 for the first item in the list. What should be returned is
Journo | count | score
John S | 2 | 6.00
Joe B | 1 | 4
However for the first one John S, it returns a count of 3.
If I directly query
select * from dan_links where link_field1 = 'John S'
I get 2 records return as I would expect. I can't for the life of me figure out why the count is wrong, unless for some reason it is counting the records from the dan_vote table
How can I get the correct count, or is my query completely wrong?
EDIT: Contents of the tables
dan_links
link_id | link_field1 | link | source | link_status
1 | John S | http://test.com | test.com | approved
2 | John S | http://google.com | google | approved
3 | Joe B | http://facebook.com | facebook | approved
dan_votes
vote_id | link_id | vote_value
1 | 1 | 5
2 | 1 | 8
3 | 2 | 4
4 | 3 | 1
EDIT: it looks like it is counting the rows in the votes table for some reason

When you are doing a left outer join with the condition link_id = vote_link_id for every matching record one row is created, some thing like
link_id | link_field1 | link | source | link_status|vote_id|vote_value
1 | John S | http://test.com | test.com | approved|1|5
1 | John S | http://test.com | test.com | approved|2|8
2 | John S | http://google.com | google | approved|3|4
3 | Joe B | http://facebook.com | facebook | approved|4|1
Now when you do group by on link_field1, you get count as 3 for John S
Nested query might work
SELECT journo,count(linkid) as articles,AVG(score) FROM
(SELECT a.link_field1 AS journo, AVG( b.vote_value ) AS score, a.link_id as linkid
FROM dan_links a
LEFT JOIN dan_votes b
ON link_id = vote_link_id
WHERE link_field1 <> ''
and link_status NOT IN ('discard', 'spam', 'page')
GROUP BY link_id
ORDER BY link_field1, link_id) GROUP BY journo
The above query will give incorrect average as ((n1+n2)/2+n3)/2 != (n1+n2+n3)/3, so use the below query
SELECT journo,count(linkid) as articles, SUM(vote_sum)/SUM(count(linkid))
FROM
(SELECT a.link_field1 AS journo, SUM( b.vote_value ) AS vote_sum, a.link_id as linkid, count(a.link_id) as count_on_id
FROM dan_links a
LEFT JOIN dan_votes b
ON link_id = vote_link_id
WHERE link_field1 <> ''
and link_status NOT IN ('discard', 'spam', 'page')
GROUP BY link_id
ORDER BY link_field1, link_id) GROUP BY journo
Hope this helps.

Related

Sqlite / populate new column that ranks the existing rows

I've a SQLite database table with the following columns:
| day | place | visitors |
-------------------------------------
| 2021-05-01 | AAA | 20 |
| 2021-05-01 | BBB | 10 |
| 2021-05-01 | CCC | 3 |
| 2021-05-02 | AAA | 5 |
| 2021-05-02 | BBB | 7 |
| 2021-05-02 | CCC | 2 |
Now I would like to introduce a column 'rank' which indicates the rank according to the visitors each day. Expected table would look like:
| day | place | visitors | Rank |
------------------------------------------
| 2021-05-01 | AAA | 20 | 1 |
| 2021-05-01 | BBB | 10 | 2 |
| 2021-05-01 | CCC | 3 | 3 |
| 2021-05-02 | AAA | 5 | 2 |
| 2021-05-02 | BBB | 7 | 1 |
| 2021-05-02 | CCC | 2 | 3 |
Populating the data for the new column Rank can be done with a program like (Pseudocode).
for each i_day in all_days:
SELECT
ROW_NUMBER () OVER (ORDER BY `visitors` DESC) Day_Rank, place
FROM mytable
WHERE `day` = 'i_day'
for each i_place in all_places:
UPDATE mytable
SET rank= Day_Rank
WHERE `Day`='i_day'
AND place = 'i_place'
Since this line by line update is quite inefficient, I'm searching how to optimize this with a SQL sub query in combination with the UPDATE.
(does not work so far...)
for each i_day in all_days:
UPDATE mytable
SET rank= (
SELECT
ROW_NUMBER () OVER (ORDER BY `visitors` DESC) Day_Rank
FROM mytable
WHERE `day` = 'i_day'
)
Typically, this can be done with a subquery that counts the number of rows with visitors greater than the value of visitors of the current row:
UPDATE mytable
SET Day_Rank = (
SELECT COUNT(*) + 1
FROM mytable m
WHERE m.day = mytable.day AND m.visitors > mytable.visitors
);
Note that the result is actually what RANK() would return, if there are ties in the values of visitors.
See the demo.
Or, you could calculate the rankings with ROW_NUMBER() in a CTE and use it in a subquery:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY day ORDER BY visitors DESC) rn
FROM mytable
)
UPDATE mytable
SET Day_Rank = (SELECT rn FROM cte c WHERE (c.day, c.place) = (mytable.day, mytable.place));
See the demo.
Or, if your versipn of SQLite is 3.33.0+ you can use the join-like UPDATE...FROM... syntax:
UPDATE mytable AS m
SET Day_Rank = t.rn
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY day ORDER BY visitors DESC) rn
FROM mytable
) t
WHERE (t.day, t.place) = (m.day, m.place);

SQLite : DELETE on multiple WHERE criteria from query

I have a SQLite database where I need to delete records from a many-to-many table, based on query results where 2 criteria for each row must be met.
As an example, take 2 tables :
oldEvents <select_query>
user_id | event_id qry_user_id | qry_event_id
---------+---------- -------------+-------------
1 | aaa 2 | aaa
2 | aaa 3 | bbb
2 | bbb 1 | ccc
3 | bbb
1 | ccc
3 | ccc
From table oldEvents, I want to delete each row that appears in the query, so as to end up with:
oldEvents
user_id | event_id
---------+----------
1 | aaa
2 | bbb
3 | ccc
Until now, I use a cumbersome DELETE query that concatenates the qry_user_id and qry_event_id, and uses them in an EXISTS sub-clause :
DELETE FROM oldEvents
WHERE EXISTS
(
SELECT user_id||event_id AS deleteCombo
FROM oldEvents
WHERE deleteCombo IN
(
SELECT qry_user_id||qry_event_id
FROM
<select_query>
)
)
It works, but is hardly readable, and wouldn't scale once more variables enter the scene.
I can't repeat the select_query inline in an AND-clause, because it is, itsself, a rather complicated query (triple-JOIN).
I could write the query data to a temporary table, but would rather not do that.
Anyone a suggestion on how to write a DELETE that accepts multiple WHERE criteria from a query ?
Enclose your query inside a CTE like this:
WITH cte(user_id, event_id) AS (
<your query here>
)
DELETE FROM oldEvents
WHERE (user_id, event_id) IN (SELECT user_id, event_id FROM cte);
See the demo.
Results:
SELECT * FROM oldEvents;
| user_id | event_id |
| ------- | -------- |
| 1 | aaa |
| 2 | bbb |
| 3 | ccc |

SQLite - Update a column based on values from two other tables' columns

I am trying to update Data1's ID to Record2's ID when:
Record1's and Record2's Name are the same, and
Weight is greater in Record2.
Record1
| ID | Weight | Name |
|----|--------|------|
| 1 | 10 | a |
| 2 | 10 | b |
| 3 | 10 | c |
Record2
| ID | Weight | Name |
|----|--------|------|
| 4 | 20 | a |
| 5 | 20 | b |
| 6 | 20 | c |
Data1
| ID | Weight |
|----|--------|
| 4 | 40 |
| 5 | 40 |
I have tried the following SQLite query:
update data1
set id =
(select record2.id
from record2,record1
where record1.name=record2.name
and record1.weight<record2.weight)
where id in
(select record1.id
from record1, record2
where record1.name=record2.name
and record1.weight<record2.weight)
Using the above query Data1's id is updated to 4 for all records.
NOTE: Record1's ID is the foreign key for Data1.
For the given data set the following seems to serve the cause:
update data1
set id =
(select record2.id
from record2,record1
where
data1.id = record1.id
and record1.name=record2.name
and record1.weight<record2.weight)
where id in
(select record1.id
from record1, record2
where
record1.id in (select id from data1)
and record1.name=record2.name
and record1.weight<record2.weight)
;
See it in action: SQL Fiddle.
Please comment if and as this requires adjustment / further detail.

How to join these two tables properly?

I have these two tables
customers:
id | name | address
---+---------------------+-------------------------
1 | company 1 | some address information
2 | company 2 | another address
3 | yet another company | no address here
orders:
id | customer_id | date
---+-------------+---------
1 | 2 | 20151209
2 | 2 | 20151211
3 | 3 | 20151210
4 | 1 | 20151223
Now I want to get a resulting table with each customer on the left and the amount of orders within an arbitrary period of time on the right.
For example, given this period to be 20151207 <= date <= 20151211, the resulting table should look like this:
name | orders count
--------------------+-------------
company 1 | 0
company 2 | 2
yet another company | 1
Note: date = 20151207 means the 7th of december 2015.
How to join them?
SELECT c.name, COUNT(CASE WHEN ((o.date BETWEEN 20151207 AND 20151211) OR (o.date ISNULL)) THEN o.customer_id END) AS "Total Sales" FROM customers AS c LEFT JOIN orders o ON c.id == o.customer_id GROUP BY c.name

distinct sum does not distinct values

I have 2 tables, reservations and articles:
Reservations
------------------------------
Id | Name | City |
------------------------------
1 | Mike | Stockholm
2 | Daniel | Gothenburg
2 | Daniel | Gothenburg
3 | Andre | Gothenburg (Majorna)
Articles
-------------------------------------------------------------
ArticleId | Name | Amount | ReservationId |
-------------------------------------------------------------
10 | Coconuts | 1 | 1
10 | Coconuts | 4 | 2
11 | Apples | 2 | 2
12 | Oranges | 2 | 3
I want to select Articles Name and the sum of Articles.Amount per Articles.ArticleId and Reservations.City.
My code:
SELECT distinct r.ID,a.Name as ArticleName,
sum(a.Amount) as ArticlesAmount,
substr(r.City,1,3) as ToCityName
FROM Reservations r
INNER JOIN Articles a
on r.Id = a.ReservationId
WHERE a.Name <> ''
GROUP BY ToCityName,a.ArticleId,a.Name
ORDER BY ToCityName ASC
This gives me following result:
Id | ArticleName | ArticlesAmount | ToCityName
2 | Coconuts | 8 | Got
2 | Apples | 4 | Got
3 | Oranges | 2 | Got
1 | Coconuts | 1 | Sto
But i want:
Id | ArticleName | ArticlesAmount | ToCityName
2 | Coconuts | 4 | Got
2 | Apples | 2 | Got
3 | Oranges | 2 | Got
1 | Coconuts | 1 | Sto
Help would be appreciated, and an explanation please :)
Fiddle
Have a look at SQLFiddle
Code:
SELECT distinct r.ID,a.Name as ArticleName,
sum(distinct a.Amount) as ArticlesAmount,
substr(r.City,1,3) as ToCityName
FROM Reservations r
INNER JOIN Articles a
on r.Id = a.ReservationId
WHERE a.Name <> ''
GROUP BY ToCityName,a.ArticleId,a.Name
ORDER BY ToCityName ASC
You want to ensure you sum the amount by the distinct number of times it appears per group.
I had added Articles again to select requested rows again... here is query
SELECT DISTINCT
r.ID,
a.`Name` AS ArticleName,
Articles.Amount,
substr(r.City, 1, 3) AS ToCityName
FROM
Reservations r
INNER JOIN Articles a ON r.Id = a.ReservationId
INNER JOIN Articles ON a.ReservationId = Articles.ReservationId
AND a.ArticleId = Articles.ArticleId
WHERE
a. NAME <> ''
GROUP BY
ToCityName,
a.ArticleId,
a. NAME
ORDER BY
ToCityName ASC

Resources