Difference between Qualify and Having - teradata

Can someone please explain me, what is the difference between qualify...over...partition by and group by...having in Teradata?I would also like to know if there are any differences in their performances.

QUALIFY is a proprietary extension to filter the result of a Windowed Aggregate Function.
A query is logically processed in a specific order:
FROM: create the basic result set
WHERE: remove rows from the previous result set
GROUP BY: apply aggregate functions on the previous result set
HAVING: remove rows from the previous result set
OVER: apply windowed aggregate functions on the previous result set
QUALIFY: remove rows from the previous result set

Having clause is used to filter the result set of the aggregate functions like (COUNT,min,max etc)
they eliminate rows based from groups based on some criteria like this :-
SELECT dept_no, MIN(salary), MAX(salary), AVG(salary)
FROM employee
WHERE dept_no IN (100,300,500,600)
GROUP BY dept_no
HAVING AVG(salary) > 37000;
The QUALIFY clause eliminates rows based on the function value, returning a new value for each of the participating rows.
It works on the final result set.
SELECT NAME,LOCATION FROM EMPLOYEE
QUALIFY ROW_NUMBER() OVER ( PARTITION BY NAME ORDER BY JOINING_DATE DESC) = 1;
We can club both having and qualify as well in a query if we use both aggregate and analytical fucntion like below:-
SELECT StoreID, SUM(sale),
SUM(profit) OVER (PARTITION BY StoreID)
FROM facts
GROUP BY StoreID, sale, profit
HAVING SUM(sale) > 15
QUALIFY SUM(profit) OVER (PARTITION BY StoreID) > 2;
You can see there order of execution from dnoeth answer.

Related

SQLITE select unique rows

I have a table where rows appear to be "duplicates" but they are actually not (they have different date).
Suppose each record has a column A that is supposed to be unique. However due to this column A could or could not appear again later with updated information (with column A unchanged), it is no longer unique even when it should be.
Therefore I want the table with latest information only. Currently this table contains 500k entries, however the "true" number of unique entries is less than half of it.
I have tried
SELECT *
FROM TABLE
WHERE A = A
AND Date = (SELECT MAX(Date) from TABLE)
ORDER BY DATE
However this only returns 2 results. How do I achieve that?
The subquery on the date is the correct idea, but you must include the column A in the subquery and relate it back to the main table. I prefer to use explicit joins rather than embedding the subquery in the WHERE statement. This is usually more efficient anyway.
SELECT TABLE.*
FROM TABLE INNER JOIN
(SELECT A, MAX(Date) AS MaxDate FROM TABLE GROUP BY A) AS latest
ON TABLE.A = latest.A AND TABLE.date = latest.MaxDate
ORDER BY A, date
Or even better, I prefer CTE (Common Table Expression) syntax, since it makes the individual queries easier to read:
WITH latest AS (
SELECT A, MAX(Date) AS MaxDate
FROM TABLE
GROUP BY A
)
SELECT TABLE.*
FROM TABLE INNER JOIN latest
ON TABLE.A = latest.A AND TABLE.date = latest.MaxDate
ORDER BY TABLE.A, TABLE.date
Comparison to other answer
The answer by MikeT relies on a non-standard feature of sqlite. That is okay of itself as long as you are aware that the solution is not compatible with other databases engines/servers and SQL dialects.
The next possible gotcha really relies on your actual data and table schema (neither of which you shared in the question details). If your data allows multiple rows with the same date for the a single A column value, then the conditions in your question are not enough to definitively remove all duplicates. You would need to identify another column by which to resolve any remaining duplicates, but once again your question did not do that.
However, in testing, I found that my solution allows unresolved duplicates to remain in the results. MikeT's solution eliminate all duplicates, but it does so by arbitrarily excluding one of those duplicates. There are ways to fix either solution to definitely select which duplicate to keep, but I will not even attempt that unless you post actual data and the table schema so that my answer is not just mere guessing. I'm glad that my answer was useful thus far, but you need to understand your data better (than reveal in the question) to ensure what solution is actually best.
Bonus
Against my better judgement to just keep expanding on answers... since you should really research this separately... here's an example of how you would continue joining this with other queries...
WITH latest AS (
SELECT A, MAX(Date) AS MaxDate
FROM TABLE
GROUP BY A
),
firstResults AS (
SELECT TABLE.*
FROM TABLE INNER JOIN latest
ON TABLE.A = latest.A AND TABLE.date = latest.MaxDate
ORDER BY TABLE.A, TABLE.date
)
SELECT otherTable.*
FROM firstResults JOIN otherTable
ON firstResults.A = otherTable.A
WHERE somecondition = 'foobar'
Another approach if you're using a somewhat recent version of sqlite (3.25 or newer), using the row_number() window function to rank groups of the same a value by date and picking the first one:
WITH cte AS
(SELECT a, date, row_number() OVER (PARTITION BY a ORDER BY date DESC) AS rn
FROM yourtable)
SELECT a, date
FROM cte
WHERE rn = 1;
One important thing to note since I noticed you mentioning another answer was slow is that an index on mytable(a, date DESC) will be needed for this query for best results, and an index on mytable(a, date) will speed up the other answers given.
I believe, if I understand what you have written, that you could use :-
SELECT a,max(date), other FROM mytable GROUP BY a ORDER BY date;
note that the other column represents other columns (if present)
However, the other column will be an arbritary value (from one of the grouped columns) which may well be the required value (in the example it is).
As per :-
Each expression in the result-set is then evaluated once for each
group of rows. If the expression is an aggregate expression, it is
evaluated across all rows in the group. Otherwise, it is evaluated
against a single arbitrarily chosen row from within the group. If
there is more than one non-aggregate expression in the result-set,
then all such expressions are evaluated for the same row.
SQL As Understood By SQLite - SELECT
More correctly, to eliminate an arbritary value(sic) for the other column, you could use :-
SELECT
a /* will always be the same and isn't arbritary */,
max(date) /* will be the maximum data */ AS date,
(SELECT other FROM mytable WHERE a = m.a AND date = m.date) AS other
FROM mytable AS m /* AS m allows the outer query to be distinguished from the inner query */
GROUP BY a /* this effectivel removes duplicates on the a column */
ORDER BY date
;
The example below appears to produce the same result.
Example :-
Using the following to populate the table with some generated testing data :-
CREATE TABLE IF NOT EXISTS mytable (a TEXT, date TEXT, other);
WITH cte(count,a,date,other) AS
(
SELECT 1,1,date('now','+'||(random() % 30)||' days'),'other1'
UNION ALL SELECT count+1,abs(random()) % 20,date('now','+'||(abs(random()) % 30)||' days'), 'other'||(count+1) FROM cte LIMIT 100
INSERT INTO mytable (a,date,other) SELECT a,date,other FROM cte
;
SELECT * FROM mytable ORDER BY DATE DESC;
in this case :-
Highlighted rows being those required to be extracted.
Then after the above has been run the following is run
SELECT * FROM mytable WHERE a = a AND date = (SELECT MAX(date) FROM mytable);
SELECT * FROM mytable WHERE /*a = a AND*/ date = (SELECT MAX(date) FROM mytable);
/* Will only select 1 row per unique value of a BUT other will be an arbritary value not necessairlly the latest */
SELECT a,max(date), other FROM mytable GROUP BY a /* group by effectively display unique */;
SELECT
a /* will always be the same and isn't arbritary */,
max(date) /* will be the maximum data */ AS date,
(SELECT other FROM mytable WHERE a = m.a AND date = m.date) AS other
FROM mytable AS m
GROUP BY a
;
The first two results show that a = a does nothing as it will always be true.
The thrid query produces (unordered) :-
Note ticks assigned by checking the value of other from the previous result.
In this case this shorter query works OK even though values of other are arbritary values (they aren't really as it depends upon how the query planner plasn the query).
The fourth, the more correct, produces the same results :-
Result 2 (your orignal query) and 3 (original without a = a) produce :-
and :-

Analyzing RDF Graph: average number of certain relation

I'm new to SPARQL.
I'm trying to find a way to generally analyze and RDF graph, meaning for example the average number of a certain relation for a subject.
So if we would have the data
[Alice likes Money]
[Bob has Money]
[Bob likes Diving]
[Bob likes Skiing]
What is the average number of "likes" per node, (here: 1.5).
My first try is to simply write a script to iterate all distinct objects and query for the count of likes relations on each.
Is there a way to do this directly in SPARQL?
Yes you can use GROUP BY and aggregates for this kind of thing. See Aggregates in the specification for an overview of this.
If you wanted to get the likes per node you can do so like so:
PREFIX : <http://example.org/ns#>
SELECT ?node (COUNT(*) AS ?likes)
WHERE
{
?s :likes ?node
}
GROUP BY ?node
Here we group by the ?node and do a COUNT(*) which simply counts the number of solutions in the group. This gives us the number of likes for every distinct ?node value in a single query.
If we wanted to find the average likes per node we can also do this using aggregates:
PREFIX : <http://example.org/ns#>
SELECT
(COUNT(*) AS ?likeCount)
(COUNT(DISTINCT ?node) AS ?nodeCount)
(?likeCount / ?nodeCount AS ?avgLikesPerNode)
WHERE
{
?s :likes ?node .
}
Here we use COUNT(*) again to get the total number of likes and then we use COUNT(DISTINCT ?node) which will count the distinct values for ?node and then we can simply divide our ?likeCount by our ?nodeCount to give us the average likes per node.

Oracle PL/SQL ORA-00937 "not a single-group group function"

Im working with the oracle pdf's to learn pl/sql.
There is an exercise where i have to create a new table with data out
of two other tables already existing. I thought this would do the trick:
CREATE TABLE new_depts
AS SELECT d.department_id, d.department_name, sum(e.salary) dept_sal
FROM employees e, departments d
WHERE e.department_id = d.department_id;
But this raises the following error:
SQL-Fehler: ORA-00937: not a single-group group function
00937. 00000 - "not a single-group group function"
I cant find something usefull about this error. From what i know yet
about SQL my code should work fine!
Am i wrong?
Try adding group by clause :
CREATE TABLE new_depts
AS SELECT d.department_id, d.department_name, sum(e.salary) dept_sal
FROM employees e, departments d
WHERE e.department_id = d.department_id
group by d.department_id,d.department_name
Update 1
You need to use group by clause in your select query because you are using aggregate function: sum(e.salary). If you are using aggregate function then you need to have group by clause. Please see here for more information about group by clause.
The main concept to understanding why aggregate functions or columns that are specified in the GROUP BY clause cannot be mixed with other non aggregate expressions in the select list is the level of detail of the value they produce. The select list of the SELECT statement can include only expressions that produce values that are on the same level of detail as others in that select list.
Example 1: incorrect
SELECT avg(col1) --> level of detail of the value is aggregated
,col2 --> level of detail of the value is only for one row
FROM table_a;
Example 2: correct
SELECT avg(col1) --> level of detail of the value is aggregated
,col2 --> level of detail of the value is aggregated
FROM table_a
GROUP BY col2;
By including a column in the GROUP BY clause you aggregate the specified column and change its level of detail from single row to aggregate.

expression after group by has same result

Why query:
SELECT id, MAX(probe_time) AS Expr1
FROM app_states
GROUP BY logon_id
and
SELECT id, MIN(probe_time) AS Expr1
FROM app_states
GROUP BY logon_id
has same id result.
I wish to select row with MAX or MIN time for every user.
I'm afraid you'll have to use sub queries here.
SELECT id, probe_time
FROM app_states
WHERE probe_time = (SELECT MAX(probe_time) from app_states GROUP BY logon_id)
BTW, SQLite does interesting optimization on MIN/MAX:
Queries of the following forms will be optimized to run in logarithmic time assuming appropriate indices exist:
SELECT MIN(x) FROM table;
SELECT MAX(x) FROM table;
In order for these optimizations to occur, they must appear in exactly the form shown above - changing only the name of the table and column. It is not permissible to add a WHERE clause or do any arithmetic on the result. The result set must contain a single column. The column in the MIN or MAX function must be an indexed column.
— http://www.sqlite.org/optoverview.html#minmax

Sqlite group_concat ordering

In Sqlite I can use group_concat to do:
1...A
1...B
1...C
2...A
2...B
2...C
1...C,B,A
2...C,B,A
but the order of the concatenation is random - according to docs.
I need to sort the output of group_concat to be
1...A,B,C
2...A,B,C
How can I do this?
Can you not use a subselect with the order by clause in, and then group concat the values?
Something like
SELECT ID, GROUP_CONCAT(Val)
FROM (
SELECT ID, Val
FROM YourTable
ORDER BY ID, Val
)
GROUP BY ID;
To be more precise, according to the docs:
The order of the concatenated elements is arbitrary.
It does not really mean random, it just means that the developers reserve the right to use whatever ordering they whish, even different ones for different queries or in different SQLite versions.
With the current version, this ordering might be the one implied by Adrian Stander's answer, as his code does seem to work. So you might just guard yourself with some unit tests and call it a day. But without examining the source code of SQLite really closely you can never be 100% sure this will always work.
If you are willing to build SQLite from source, you can also try to write your own user-defined aggregate function, but there is an easier way.
Fortunately, since version 3.25.0, you have window functions, providing a guaranteed-to-work, although somewhat ugly solution to your problem.
As you can see in the documentation, window functions have their own ORDER BY clauses:
In the example above, the window frame consists of all rows between the previous row ("1 PRECEDING") and the following row ("1 FOLLOWING"), inclusive, where rows are sorted according to the ORDER BY clause in the window-defn (in this case "ORDER BY a").
Note, that this alone would not necessarily mean that all aggregate functions respect the ordering inside a window frame, but if you take a look at the unit tests, you can see this is actually the case:
do_execsql_test 4.10.1 {
SELECT a,
count() OVER (ORDER BY a DESC),
group_concat(a, '.') OVER (ORDER BY a DESC)
FROM t2 ORDER BY a DESC
} {
6 1 6
5 2 6.5
4 3 6.5.4
3 4 6.5.4.3
2 5 6.5.4.3.2
1 6 6.5.4.3.2.1
0 7 6.5.4.3.2.1.0
}
So, to sum it up, you can write
SELECT ID, GROUP_CONCAT(Val) OVER (PARTITION BY ID ORDER BY Val) FROM YourTable;
resulting in:
1|A
1|A,B
1|A,B,C
2|A
2|A,B
2|A,B,C
Which unfortunately also contains every prefix of your desired aggregations. Instead you want to specify the window frames to always contain the full range, then discard the redundant values, like this:
SELECT DISTINCT ID, GROUP_CONCAT(Val)
OVER (PARTITION BY ID ORDER BY Val ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
FROM YourTable;
or like this:
SELECT * FROM (
SELECT ID, GROUP_CONCAT(Val)
OVER (PARTITION BY ID ORDER BY Val ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
FROM YourTable
)
GROUP BY ID;
Stumbling upon the underlying sorting-problem I tried this:
(... on 10.4.18-MariaDB)
select GROUP_CONCAT(ex.ID) as ID_list
FROM (
SELECT usr.ID
FROM (
SELECT u1.ID as ID
FROM table_users u1
) usr
GROUP BY ID
) ex
... and found the serialized ID_list ordered!
But I don't have an explanation for this now "correct" (?) result.

Resources