KQL multiple aggregates in a summarize statement

KQL multiple aggregates in a summarize statement - azure-data-explorer

I have data in a table for azure data explorer, let's say the following columns:
Day, non-unique-ID, Message-Content
What I want as an output is a table containing:
Day, Count of records per day, distinct Count of non-unique-ID per day
I know how to get one or the other:
summarize count() by Day
summarize dcount(non-unique-ID) by Day
but I don't know how to get a table containing both of those columns, because summarize will only let me run a single aggregate query per command.

You can use multiple aggregation functions in the same summarize operator, all you have to do is separate them with commas. So this will work:
summarize count(), dcount(non-unique-ID) by Day

Related

Kusto summarize 3 or more columns

Is there a way to use summarize to group 3 or more columns? I've been able to successfully get data from 1 or 2 columns then group by another column, but it breaks when trying to add a 3rd. This question asks how to add a column, but only regards adding a 2nd, not a 3rd or 4th. Using the sample help cluster on Azure Data Explorer and working with the Covid19 table, ideally I would be able to do this:
Covid19
| summarize by Country, count() Recovered, count() Confirmed, count() Deaths
| order by Country asc
And return results like this
But that query throws an error "Syntax Error. A recognition error occurred. Token: Recovered. Line: 2, Position: 36"

I had the right basic idea, you just can't use count repeatedly inline like that. You can use sum, dcount, or max:
Covid19
| summarize sum(Recovered), sum(Confirmed), sum(Deaths) by Country
| order by Country asc
Another example:
Covid19
| where Timestamp == max_of(Timestamp, Timestamp)
| summarize confirmedCases = max(Confirmed), active = max(Active), recovered = max(Recovered), deaths = max(Deaths) by Country
| order by Country asc
In this example I'm getting the latest data for each of the selected columns. Since I initially used the where clause to get the latest data you would think I could just list the columns, but when using summarize you have to use an aggregate function so I used max on each column

Counting overlapping prescriptions in R

Firstly, I'm new to R and I apologize. So I'm working with data involving prescriptions. Since it's on a secure VM, I can't copy and paste, but the data structure looks like this:
Patient ID | Medication | Start Date | End Date
There are multiple rows for each patient, since each patient has been precribed more than one medication.
What I want to do is the following:
Find out how many medications/which medications the patients are on that overlap each other in terms of time frame, and then return how many overlapping prescriptions the patients has. Is there a way to do this in R?

Count and summarise ID and the Date of Purchase while creating a third column that reflects the amount of Purchased per one day and Customer

Good afternoon dear Community,
I am quite new in the R language so forgive me if I am not to precise or specific with my description of the problem yet.
I have a data frame which contains two columns. First one being the ID and second one being the Date of purchase. However, some ID's appear more often during one Date and I would like to summarise the ID and Date, while the third column (amount of Purchases) reflects the quantity of purchases.
ID and Purchase Date
Many thanks in Advance.

There is an R package called dplyr that makes this kind of aggregation very easy. In your case you could summarise the data using a few lines of code.
library(dplyr)
results <- df %>%
group_by(ID, Date) %>%
summarise(numPurchases = n(),
totalPurchases = sum(Quantity))
df would be your input data. Your results will have the ID and Date columns, as well as a new column that counts the number of sales per ID per Date (numPurchases) and a new column that shows the total quantity of purchases per ID per date (totalPurchases). Hope that helps.

`dpylr` count function for unique items in field

I have searched for this on here a few times, so apologies if this is a duplicate.
I am working with dplyr for the first time, and I am having trouble coming up with what I'd like. If I was doing SQL, the query would look like:
select count(customer_id), sum(sales), (sum(sales) / count(customer_id), *
from data_table
group by salesperson_id
In words, I want to:
group the data by salesperson
add up the total sales
count the number of unique customers
find the average sales per customer for each sales person.
I don't want to strip away "irrelevant" fields at this point, because they will become relevant in later steps.
I am getting stuck, specifically because the only counting function dplyr provides doesn't take any arguments. What aggregate function should I use to count distinct items in a field?

Responding to the question: What aggregate function should I use to count distinct items in a field?
n_distinct()
See docs here.
A broader example, though a reprex in the original question would help:
data_table %>%
group_by(salesperson_id) %>%
mutate(
customers = n_distinct(customer_id),
sales = sum(sales),
sales_per_customer = sales / customers
)

Selecting multiple maximum values? In Sqlite?

Super new to SQLite but I thought it can't hurt to ask.
I have something like the following table (Not allowed to post images yet) pulling data from multiple tables to calculate the TotalScore:
Name TotalScore
Course1 15
Course1 12
Course2 9
Course2 10
How the heck do I SELECT only the max value for each course? I've managed use
ORDER BY TotalScore LIMIT 2
But I may end up with multiple Courses in my final product, so LIMIT 2 etc won't really help me.
Thoughts? Happy to put up the rest of my query if it helps?

You can GROUP the resultset by Name and then use the aggregate function MAX():
SELECT Name, max(TotalScore)
FROM my_table
GROUP BY Name
You will get one row for each distinct course, with the name in column 1 and the maximum TotalScore for this course in column 2.
Further hints
You can only SELECT columns that are either grouped by (Name) or wrapped in aggregate functions (max(TotalScore)). If you need another column (e.g. Description) in the resultset, you can group by more than one column:
...
GROUP BY Name, Description
To filter the resulting rows further, you need to use HAVING instead of WHERE:
SELECT Name, max(TotalScore)
FROM my_table
-- WHERE clause would be here
GROUP BY Name
HAVING max(TotalScore) > 5
WHERE filters the raw table rows, HAVING filters the resulting grouped rows.

Functions like max and sum are "aggregate functions" meaning they aggregate multiple rows together. Normally they aggregate them into one value, like max(totalscore) but you can aggregate them into multiple values with group by. group by says how to group the rows together into aggregates.
select name, max(totalscore)
from scores
group by name;
This groups all the columns together with the same name and then does a max(totalscore) for each name.
sqlite> select name, max(totalscore) from scores group by name;
Course1|15
Course2|12

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

KQL multiple aggregates in a summarize statement - azure-data-explorer

You can use multiple aggregation functions in the same summarize operator, all you have to do is separate them with commas. So this will work: summarize count(), dcount(non-unique-ID) by Day

Related

Kusto summarize 3 or more columns

Counting overlapping prescriptions in R

Count and summarise ID and the Date of Purchase while creating a third column that reflects the amount of Purchased per one day and Customer

`dpylr` count function for unique items in field

Selecting multiple maximum values? In Sqlite?

Categories

Resources