How to use top 1 from subquery in MariaDB? - mariadb

Let's say I want to get the list of tickets, and for each ticket I want to find out the date of the latest post. In SQL Server I can do it this way:
select
Tickets.*
(
select top 1 [Date]
from Posts
where TicketId = Tickets.Id
order by [Date] desc
) as LatestPostDate
from Tickets
I realized that we can't use top 1 in MariaDB. And as I searched, we should use limit 1. But this does not work:
select
Tickets.*
(
select `Date`
from Posts
where TicketId = Tickets.Id
order by `Date` desc
limit 1
) as LatestPostDate
from Tickets

You can use LIMIT in a sub-query. You are just missing a comma in your query:
select
Tickets.*, <-- missing comma
(
select `Date`
from Posts
where TicketId = Tickets.Id
order by `Date` desc
limit 1
) as LatestPostDate
from Tickets
In fact sub-queries should always have a LIMIT 1 clause to make sure you always get only one row back.

Related

SOQL Query with a Subquery that uses GROUP BY and HAVING throws unknown error

I'm trying to query a list of records from a custom object (SB_User__c) where the value in the Email__c field is not unique.
The following query captures the entire table as expected:
SELECT Id, Name, Email__c, External_Id__c
FROM SB_User__c
ORDER BY Email__c, Name
And my subquery returns a list of Email__c values that are not unique:
SELECT Email__c
FROM SB_User__c
GROUP BY Email__c
HAVING COUNT(Id) > 1
But when these queries are combined, I receive an unknown error:
SELECT Id, Name, Email__c, External_Id__c
FROM SB_User__c
WHERE Email__c IN (
SELECT Email__c FROM SB_User__c
GROUP BY Email__c
HAVING COUNT(Id) > 1)
ORDER BY Email__c, Name
Is there a way to accomplish what I'm trying to without involving apex?
i think the problem is probably in the having clause, i already run a similar query with this editor online:
online editor for sql queries
SELECT * FROM Customers
WHERE CustomerID IN (
SELECT CustomerID FROM Customers
GROUP BY CustomerID
HAVING CustomerID > 50)
ORDER BY Country ASC, CustomerName DESC;
and this query runs just ok, you can check the having clause.

Selecting all max values of column for each distinct value of other column

I am trying to get a list of most used tags for posts on a website on a given day. I currently have this query:
SELECT posts.pdate, tags.tag, count(posts.pid) as post_count
FROM posts, tags
WHERE posts.pid = tags.pid
GROUP BY posts.pdate, tags.tag
ORDER BY posts.pdate;
This provides me with each distinct tag, along with the date they are used on as well as how many posts used them, returning me with this:
2020-09-10|CMPUT291|1
2020-09-10|computing|1
2020-09-10|database|2
2020-09-10|frequentTag1|2
2020-09-10|relational|2
2020-09-10|sql|1
2020-09-10|tieTag1|2
2020-09-11|Database|1
2020-09-11|data|1
2020-09-11|relational|1
2020-09-11|sql|1
2020-09-13|Database|1
2020-09-13|Sql language|1
2020-09-13|access|1
2020-09-13|frequentTag3|2
2020-09-13|query|3
2020-09-13|relational|3
2020-09-13|sql|1
2020-09-17|Database|1
2020-09-17|frequentTag3|3
2020-09-17|query|1
2020-09-17|relational|1
2020-09-17|sql|1
2020-09-17|sql language|1
2020-09-20|RELATIONAL|1
2020-09-20|database|1
2020-09-20|query|1
2020-09-20|sql language|1
2020-09-25|database|1
2020-09-25|sql language|1
2020-09-30|boring|2
2020-09-30|extra tag|1
2020-09-30|fun|3
2020-09-30|just here|1
2020-09-30|more tag|1
2020-09-30|sleep|3
2020-09-30|tag tag|1
2020-09-30|tag test|1
2020-09-30|test tag|1
But, I now need to make it only give me the rows that have the max (or all of them with max in case of a tie) for each date.
I WANT to be able to use MAX(count(posts.pid)) but I know that doesn't work so I need to find an alternative.
I should get a final result of this:
2020-09-10|database|2
2020-09-10|frequentTag1|2
2020-09-10|relational|2
2020-09-10|tieTag1|2
2020-09-11|Database|1
2020-09-11|data|1
2020-09-11|relational|1
2020-09-11|sql|1
2020-09-13|query|3
2020-09-13|relational|3
2020-09-17|frequentTag3|3
2020-09-20|RELATIONAL|1
2020-09-20|database|1
2020-09-20|query|1
2020-09-20|sql language|1
2020-09-25|database|1
2020-09-25|sql language|1
2020-09-30|fun|3
2020-09-30|sleep|3
Any help would be greatly appreciated.
APPLICABLE SCHEMA:
create table posts (
pid char(4),
pdate date,
title text,
body text,
poster char(4),
primary key (pid),
foreign key (poster) references users
);
create table tags (
pid char(4),
tag text,
primary key (pid,tag),
foreign key (pid) references posts
);
You can use RANK() window function:
SELECT pdate, tag, post_count
FROM (
SELECT p.pdate,
t.tag,
COUNT(*) post_count,
RANK() OVER (PARTITION BY p.pdate ORDER BY COUNT(*) DESC) rnk
FROM posts p INNER JOIN tags t
ON p.pid = t.pid
GROUP BY p.pdate, t.tag
)
WHERE rnk = 1
ORDER BY pdate, tag;
You should use a proper JOIN with an ON clause instead of that outdated syntax with the WHERE clause.

Finding the oldest customers in a sql database

I'm trying to find the oldest person in a SQL database that has the following configuration:
Customers (
cardNo INTEGER PRIMARY KEY,
first TEXT,
last TEXT,
sex CHAR,
dob DATE
)
I'm trying to find the oldest customers in the database, of which there are 28 (they have the same dob). I'm not sure how to get multiple results from the min() keyword.
You can do this with a subquery.
Something like:
SELECT first, last FROM Customers
WHERE
dob = (SELECT MIN(dob) FROM Customers);
I believe that MIN() / MAX() is an aggregate function which means that it returns a single scalar value.
More info for the aggregate functions can be found here: Aggregate functions info
But to solve your problem, The query should be like this.
MS SQL
SELECT
c.first,
c.last
FROM Customers c
WHERE c.dob IS NOT NULL
AND c.dob = (
SELECT TOP 1 cc.dob
FROM Customers cc
WHERE cc.dob IS NOT NULL
ORDER BY cc.dob
)
GROUP BY c.dob
ORDER BY c.dob
SQL LITE
SELECT
c.first,
c.last
FROM Customers c
WHERE c.dob IS NOT NULL
AND c.dob = (
SELECT cc.dob
FROM Customers cc
WHERE cc.dob IS NOT NULL
ORDER BY cc.dob
LIMIT 1
)
GROUP BY c.dob
ORDER BY c.dob
I think it will still need optimization. Hope this helps. :)

Google BigQuery - Updating nested Revenue fields

I tried to apply the solution in Google BigQuery - Updating a nested repeated field to the field hits.transaction.transactionRevenue, but I receive error message:
Scalar subquery produced more than one element
I have tried to run the following query:
UPDATE `project_id.dataset_id.table`
SET hits = ARRAY(
SELECT AS STRUCT * REPLACE (
(SELECT AS STRUCT transaction.* REPLACE (1 AS transactionRevenue)) AS transaction
)
FROM UNNEST(hits) as transactionRevenue
)
WHERE (select h.transaction.transactionId from unnest(hits) as h) LIKE 'ABC123XYZ'
Are there any obvious mistakes on my part? Would be great if anyone could share some tips or experiences that could help me with this.
What I basically want to do is to set the revenue of a specific transaction to 1.
Many thanks in advance,
David
This is the problem:
WHERE (select h.transaction.transactionId from unnest(hits) as h) LIKE 'ABC123XYZ'
If there is more than one hit in the array, this will cause the error that you are seeing. You probably want this instead:
WHERE EXISTS (select 1 from unnest(hits) as h WHERE h.transaction.transactionId LIKE 'ABC123XYZ')
But note that your UPDATE will now replace all elements of the array for any row where this condition is true. What you may want is to move the condition inside the ARRAY function call instead:
UPDATE `project_id.dataset_id.table`
SET hits = ARRAY(
SELECT AS STRUCT * REPLACE (
(SELECT AS STRUCT transaction.* REPLACE (1 AS transactionRevenue)) AS transaction
)
FROM UNNEST(hits) as h
WHERE h.transaction.transactionId LIKE 'ABC123XYZ'
)
WHERE true
Now the replacement will only apply to hits with a transaction ID matching the pattern.

Is it possible to use WHERE clause in same query as PARTITION BY?

I need to write SQL that keeps only the minimum 5 records per each identifiable record in a table. For this, I use partition by and delete all records where the value returned is greater than 5. When I attempt to use the WHERE clause in the same query as the partition by statement, I get the error "Ordered Analytical Functions not allowed in WHERE Clause". So, in order to get it to work, I have to use three subqueries. My SQL looks ilke this:
delete mydb.mytable where (field1,field2) in
(
select field1,field2 from
(
select field1,field2,
Rank() over
(
partition BY field1
order by field1,field2
) n
from mydb.mytable
) x
where n > 5
)
The innermost subquery just returns the raw data. Since I can't use WHERE there, I wrapped it with a subquery, the purpose of which is to 1) use WHERE to get records greater than 5 in rank and 2) select only field1 and field2. The reason why I select only those two fields is so that I can use the IN statement for deleting those records in the outermost query.
It works, but it appears a bit cumbersome. I'd like to consolidate the inner two subqueries into a single subquery. Is this possible?
Sounds like you need to use the QUALIFY clause which is the HAVING clause for Window Aggregate functions. Below is my take on what you are trying to accomplish.
Please do not run this SQL directly against your production data without first testing it.
/* Physical Delete */
DELETE TGT
FROM MyDB.MyTable TGT
INNER JOIN
(SELECT Field1
, Field2
FROM MyDB.MyTable
QUALIFY ROW_NUMBER() (PARTITION BY Field1, ORDER BY Field1,2)
> 5
) SRC
ON TGT.Field1 = SRC.Field1
AND TGT.Field2 = SRC.Fileld2
/* Logical Delete */
UPDATE TGT
FROM MyDB.MyTable TGT
,
(SELECT Field1
, Field2
FROM MyDB.MyTable
QUALIFY ROW_NUMBER() (PARTITION BY Field1, ORDER BY Field1,2)
> 5
) SRC
SET Deleted = 'Y'
/* RecordExpireDate = Date - 1 */
WHERE TGT.Field1 = SRC.Field1
AND TGT.Field2 = SRC.Fileld2

Resources