Clickhouse topK query on several columns - olap

In ClickHouse, is there any way use the topK query on more than the column ,
for example:
select topK(10)(AGE,COUNTRY) ...
meaning I want the top10 combinations of AGE+COUNTRY,
I only found a workaround using concat on fields and topK on them, wondered if there is any other way.

You can pass array (or tuple) of columns to topK:
SELECT topK(10)([Age, Country])
FROM table
Or use the straightforward calculation (it is much slower but provides the exact result):
SELECT
Age,
Country
FROM table
GROUP BY
Age,
Country
ORDER BY count() DESC
LIMIT 10

Related

Ordering SQLite table by sum of two columns

I have a database that has two integer columns, and I'm trying to find a way to select the top 'x' amount of rows with the highest sums of these two columns. I'm trying to eliminate the need of creating a third column that stores the sum of the two, unless there's a way to to automatically update this column every time one of the other two are altered. I'm using SQLite by the way, as I know there are some slight differences here and there between SQL/SQLite syntax.
Any help is appreciated.
Something like
SELECT a, b
FROM yourtable
ORDER BY a + b DESC
LIMIT :x
should do it.

Count distinct elements across multiple columns

I have a scenario where I got multiple columns with similar content. I want to count how many distinct values are there in all the columns. Slightly different to the below linked case where content of two columns are looked at as a single attribute/element.
Counting DISTINCT over multiple columns
My table is as above. I need to go thru all club columns and count how many distinct clubs there are.
The below code I managed only counts distinct rows. Not individual distinct elements in each column.
select count(*) from( select distinct Club1 Club2 from StudentClubs) as ClubCount
The above returns 6
I need it to output 12 as there are 12 clubs in total.
Thanks in advance.
Actually I found the below solution as a interim. First stacking the columns and then using a count distinct.
It seem to work at least in this example.
select count(distinct A.Club1)
from (
select Club1 from StudentClubs as A
union all
select Club2 from StudentClubs as A
union all
select Club3 from StudentClubs as A
union all
select Club4 from StudentClubs as A
) as A
The above outputs 12.
Please do share if you have better ways to handle this.
Thanks

Find max count for a combination of several fields

Say I have a table with three fields message, environment and function.
I want to count up the records by message, environment and function, and then select the highest scoring row for any combination.
Getting the counts is easy
Table
| summarize count() by message, environment, function
...but how do I get just one row with the top count? My solution so far is to create a new table that tallies the counts, then tally max() by environment, function and then do a join, but this seems like an expensive and complicated workaround.
If I understand your original question correctly, you may want to look into summarize arg_max() as well: https://learn.microsoft.com/en-us/azure/kusto/query/arg-max-aggfunction
Ah, just modify the solution here to use max instead of sum
Add column of totals pr. field value

sqlite subqueries with group_concat as columns in select statements

I have two tables, one contains a list of items which is called watch_list with some important attributes and the other is just a list of prices which is called price_history. What I would like to do is group together 10 of the lowest prices into a single column with a group_concat operation and then create a row with item attributes from watch_list along with the 10 lowest prices for each item in watch_list. First I tried joins but then I realized that the operations where happening in the wrong order so there was no way I could get the desired result with a join operation. Then I tried the obvious thing and just queried the price_history for every row in the watch_list and just glued everything together in the host environment which worked but seemed very inefficient. Now I have the following query which looks like it should work but it's not giving me the results that I want. I would like to know what is wrong with the following statement:
select w.asin,w.title,
(select group_concat(lowest_used_price) from price_history as p
where p.asin=w.asin limit 10)
as lowest_used
from watch_list as w
Basically I want the limit operation to happen before group_concat does anything but I can't think of a sql statement that will do that.
Figured it out, as somebody once said "All problems in computer science can be solved by another level of indirection." and in this case an extra select subquery did the trick:
select w.asin,w.title,
(select group_concat(lowest_used_price)
from (select lowest_used_price from price_history as p
where p.asin=w.asin limit 10)) as lowest_used
from watch_list as w

Sqlite group_concat ordering

In Sqlite I can use group_concat to do:
1...A
1...B
1...C
2...A
2...B
2...C
1...C,B,A
2...C,B,A
but the order of the concatenation is random - according to docs.
I need to sort the output of group_concat to be
1...A,B,C
2...A,B,C
How can I do this?
Can you not use a subselect with the order by clause in, and then group concat the values?
Something like
SELECT ID, GROUP_CONCAT(Val)
FROM (
SELECT ID, Val
FROM YourTable
ORDER BY ID, Val
)
GROUP BY ID;
To be more precise, according to the docs:
The order of the concatenated elements is arbitrary.
It does not really mean random, it just means that the developers reserve the right to use whatever ordering they whish, even different ones for different queries or in different SQLite versions.
With the current version, this ordering might be the one implied by Adrian Stander's answer, as his code does seem to work. So you might just guard yourself with some unit tests and call it a day. But without examining the source code of SQLite really closely you can never be 100% sure this will always work.
If you are willing to build SQLite from source, you can also try to write your own user-defined aggregate function, but there is an easier way.
Fortunately, since version 3.25.0, you have window functions, providing a guaranteed-to-work, although somewhat ugly solution to your problem.
As you can see in the documentation, window functions have their own ORDER BY clauses:
In the example above, the window frame consists of all rows between the previous row ("1 PRECEDING") and the following row ("1 FOLLOWING"), inclusive, where rows are sorted according to the ORDER BY clause in the window-defn (in this case "ORDER BY a").
Note, that this alone would not necessarily mean that all aggregate functions respect the ordering inside a window frame, but if you take a look at the unit tests, you can see this is actually the case:
do_execsql_test 4.10.1 {
SELECT a,
count() OVER (ORDER BY a DESC),
group_concat(a, '.') OVER (ORDER BY a DESC)
FROM t2 ORDER BY a DESC
} {
6 1 6
5 2 6.5
4 3 6.5.4
3 4 6.5.4.3
2 5 6.5.4.3.2
1 6 6.5.4.3.2.1
0 7 6.5.4.3.2.1.0
}
So, to sum it up, you can write
SELECT ID, GROUP_CONCAT(Val) OVER (PARTITION BY ID ORDER BY Val) FROM YourTable;
resulting in:
1|A
1|A,B
1|A,B,C
2|A
2|A,B
2|A,B,C
Which unfortunately also contains every prefix of your desired aggregations. Instead you want to specify the window frames to always contain the full range, then discard the redundant values, like this:
SELECT DISTINCT ID, GROUP_CONCAT(Val)
OVER (PARTITION BY ID ORDER BY Val ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
FROM YourTable;
or like this:
SELECT * FROM (
SELECT ID, GROUP_CONCAT(Val)
OVER (PARTITION BY ID ORDER BY Val ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
FROM YourTable
)
GROUP BY ID;
Stumbling upon the underlying sorting-problem I tried this:
(... on 10.4.18-MariaDB)
select GROUP_CONCAT(ex.ID) as ID_list
FROM (
SELECT usr.ID
FROM (
SELECT u1.ID as ID
FROM table_users u1
) usr
GROUP BY ID
) ex
... and found the serialized ID_list ordered!
But I don't have an explanation for this now "correct" (?) result.

Resources