Kusto select distinct on one column only

Kusto select distinct on one column only - azure-data-explorer

Example Query -
traces
| project message, timestamp
This outputs something like this -
message | timestamp
------------------------------
A 2022-07-09 00:00:00
B 2022-07-11 01:00:00
A 2022-07-11 02:00:00
I want a query that gives this as output -
message | timestamp
------------------------------
A 2022-07-09 00:00:00
B 2022-07-11 01:00:00

you could use the min() aggregation function if you only have 2 fields of interest (message, timestamp), or the arg_min() aggregation function otherwise.
e.g.
traces | summarize min(timestamp) by message
traces | summarize arg_min(timestamp, *) by message

Related

Just for SQLite, is there an easy way to convert a column of text (like 21-Sep-2022) into a valid date format while query?

Just for SQLite, is there an easy way to convert a column of text (like 21-Sep-2022) into valid date format while query?
I know it's easy for other DBs, such as SQL Server and Oracle, to do so. They have existing function. I'm now meet the same situation in operating SQLite. But I did not find any "cast", "convert" or "date" function that could work and get a proper result.
I've tried DATE(), and it seems the text is not recognized and only NULL returns.

Something like this should do the job. Field name "f", table name "x".
select
-- YEAR
printf('%04d-',substr( f ,-4)) ||
-- LOOKUP FUNCTION for MONTH
printf('%02d-',
CASE substr(f, instr(f,'-')+1,3 )
WHEN 'Jan' THEN 1
WHEN 'Feb' THEN 2
WHEN 'Mar' THEN 3
WHEN 'Apr' THEN 4
WHEN 'May' THEN 5
WHEN 'Jun' THEN 6
WHEN 'Jul' THEN 7
WHEN 'Aug' THEN 8
WHEN 'Sep' THEN 9
WHEN 'Oct' THEN 10
WHEN 'Nov' THEN 11
WHEN 'Dec' THEN 12
END)
||
-- DAY
printf('%02d', substr(f, 1, instr(f,'-')) )
as thedate
from x
+-------------+
| Table f |
+-------------+
| 1-Jan-2023 |
| 19-Sep-2022 |
| 24-Dec-1989 |
+-------------+
+------------+
| thedate |
+------------+
| 2023-01-01 |
| 2022-09-19 |
| 1989-12-24 |
+------------+
The result is formatted YYYY-MM-DD, and can be processed as a date in SQLite.
Function will fail if some dates are not formatted correctly.

Kusto - Grouping by week, Week-ending

I come up against this quite often and haven't figured it out yet. Take the below query. I am trying to group into 7 day buckets, however the first and last bucket are always less than 7 days. The middle buckets are whole weeks ( or 6.23 days whatever that means).
How do I write a query where I can offset by the end date? Additionally, how can I make sure my start date is also not truncated?
requests
| where timestamp > startofday(ago(90d))
and timestamp < endofday(now()-1d)
| summarize
min(timestamp),
max(timestamp)
by
bin(timestamp, 7d)
| extend duration = max_timestamp - min_timestamp
| project-away timestamp
| order by max_timestamp

You can use bin_at() to specify the reference data for the binning. See example below, and documentation: https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/binatfunction.
If it is relevant, you could also consider using startofweek() and/or endofweek().
range timestamp from startofday(ago(30d)) to endofday(ago(1d)) step 1111ms
| summarize max(timestamp), min(timestamp) by timestamp = bin_at(timestamp, 7d, endofday(ago(1d)))
| extend duration = max_timestamp - min_timestamp
| project-away timestamp
| order by max_timestamp
-->
| max_timestamp | min_timestamp | duration |
|-----------------------------|-----------------------------|--------------------|
| 2020-06-25 23:59:59.6630000 | 2020-06-19 00:00:00.1490000 | 6.23:59:59.5140000 |
| 2020-06-18 23:59:59.0380000 | 2020-06-12 00:00:00.6350000 | 6.23:59:58.4030000 |
| 2020-06-11 23:59:59.5240000 | 2020-06-05 00:00:00.0100000 | 6.23:59:59.5140000 |
| 2020-06-04 23:59:58.8990000 | 2020-05-29 00:00:00.4960000 | 6.23:59:58.4030000 |
| 2020-05-28 23:59:59.3850000 | 2020-05-27 00:00:00.0000000 | 1.23:59:59.3850000 |

Grouping similar column string values

I have a table in Azure Log Analytics where messages are logged.
There aren't many distinct messages actually, but in every one there is a variable part like an user id or a timestamp.
I need to count the distinct message types grouped by one hour intervals, ignoring the variable elements in every message (UUID and timestamp in this case).
I don't know all the message types.
I cannot touch anything else, I am forced to work with this table.
Example data:
timestamp | message
----------|--------------------------------------------------------
| Message type A for user id 993215f6-c42a-4957-bd55-78d71306a8d0
| Message type A for user id 60e7d02c-770a-4641-b379-6bd33fcd563c
| Message type A for user id 5bf7646c-092b-4e20-ba43-de7fe01010ea
| Another message type containing timestamp hh:mm:ss
| Another message type containing timestamp hh:mm:ss
| Another message type containing timestamp hh:mm:ss
| Type C message <variable_string>
Desired output:
timestamp | distinct_message | count
----------------------------|--------------------------------------------|------
10/2/2019, 10:00:00.000 AM | Message type A for user id | 25
10/2/2019, 10:00:00.000 AM | Another message type containing timestamp | 13
10/2/2019, 10:00:00.000 AM | Type C message | 0
10/2/2019, 11:00:00.000 AM | Message type A for user id | 4
10/2/2019, 11:00:00.000 AM | Another message type containing timestamp | 6
10/2/2019, 11:00:00.000 AM | Type C message | 2
This is what I've managed to create, but my knowledge of KQL is quite limited.
let regex_uid = "[[:xdigit:]]+-[[:xdigit:]]+-[[:xdigit:]]+-[[:xdigit:]]+-[[:xdigit:]]+";
traces
| where timestamp > ago(1d)
| extend message = replace(regex_uid, "", message)
| extend message = replace("[0-9]+", "", message)
| extend message = iif(message startswith "Type C message", "Type C message", message )
| project timestamp, message, operation_Name
| summarize count(operation_Name) by bin(timestamp, 1h), message
Is there any better way to do this?

another option for you to consider is using the reduce operator: https://learn.microsoft.com/en-us/azure/kusto/query/reduceoperator
the output won't be identical to the one in your question. though if I understand your intention correctly, it follows the same principles.

I want to find the day difference between 2 date column in azure app insight?

We have a log file where we store the searches happening on our platform. Now there is a departure date and I want to find the searches where departure date is after 330 days from today.
I am trying to run the query to find the difference between departure date column and logtime(entry time of the event into log). But getting the below error:
Query could not be parsed at 'datetime("departureDate")' on line [5,54]
Token: datetime("departureDate")
Line: 5
Position: 54
Date format of departure date is mm/dd/yyyy and logtime format is typical datetime format of app insight.
Query that I am running is below:
customEvents
| where name == "SearchLog"
| extend departureDate = tostring(customDimensions.departureDate)
| extend logTime = tostring(customDimensions.logTime)
| where datetime_diff('day',datetime("departureDate"),datetime("logTime")) > 200
As suggested I ran the below query but now I am getting 0 results but there is data that satisfy the given criteria.
customEvents
| where name == "SearchLog"
| extend departureDate = tostring(customDimensions.departureDate)
| extend logTime = tostring(customDimensions.logTime)
| where datetime_diff('day',todatetime(departureDate),todatetime(logTime)) > 200
Example:
departureDate
04/09/2020
logTime
8/13/2019 8:45:39 AM -04:00
I also tried the below query to check whether data format is supported or not and it gave correct response.
customEvents
| project datetime_diff('day', datetime('04/30/2020'),datetime('8/13/2019 8:25:51 AM -04:00'))

Please use the below query. Use todatetime statement to convert string to datetime
customEvents
| where name == "SearchLog"
| extend departureDate = tostring(customDimensions.departureDate)
| extend logTime = tostring(customDimensions.logTime)
| where datetime_diff('day',todatetime(departureDate),todatetime(logTime)) > 200

The double quotes inside datetime operator in where clause should be removed.
Your code should look like:
where datetime_diff('day',datetime(departureDate),datetime(logTime)) > 200

SQLite subtract time difference between two tables if there is a match

I need some help with a SQLite Query. I have two tables, a table called 'production' and a table called 'pause':
CREATE TABLE production (
date TEXT,
item TEXT,
begin TEXT,
end TEXT
);
CREATE TABLE pause (
date TEXT,
begin TEXT,
end TEXT
);
For every item which is produced, an entry in the table production with the current date, the start time and the end time (two timestamps in the format HH:MM:SS) is created. So let's assume, the production table looks like:
+------------+-------------+------------+----------+
| date | item | begin | end |
+------------+-------------+------------+----------+
| 2013-07-31 | Item 1 | 06:18:00 | 08:03:05 |
| 2013-08-01 | Item 2 | 06:00:03 | 10:10:10 |
| 2013-08-01 | Item 1 | 10:30:15 | 14:20:13 |
| 2013-08-01 | Item 1 | 15:00:10 | 16:00:00 |
| 2013-08-02 | Item 3 | 08:50:00 | 15:00:00 |
+------------+-------------+------------+----------+
The second table also contains a date and a start and an end time. So let's assume, the 'pause' table looks like:
+------------+------------+----------+
| date | begin | end |
+------------+------------+----------+
| 2013-08-01 | 08:00:00 | 08:30:00 |
| 2013-08-01 | 12:00:00 | 13:30:00 |
| 2013-08-02 | 10:00:00 | 10:30:00 |
| 2013-08-02 | 13:00:00 | 14:00:00 |
+------------+------------+----------+
Now I wanna get a table, which contains the time difference between the production begin and end time for every item. If there is a matching entry in the 'pause' table, the pause time should be subtracted.
So basically, the end result should look like:
+------------+------------+-------------------------------------------------+
| date | Item | time difference (in seconds), excluding pause |
+------------+------------+-------------------------------------------------+
| 2013-07-31 | Item 1 | 6305 |
| 2013-08-01 | Item 1 | 12005 |
| 2013-08-01 | Item 2 | 13207 |
| 2013-08-02 | Item 3 | 16800 |
+------------+------------+-------------------------------------------------+
I am not really sure, how I can accomplish it with SQLite. I know that it is possible to do this sort of calculation with Python, but in the end I think it would be better to let the database do the calculations. Maybe someone of you could give me a hint on how to solve this problem. I tried different queries, but I always ended up with different results than I expected.

To convert a time string to the number of seconds, use the strftime function with the %s modifier.
(A time string without a date part will be assumed to have the date 2000-01-01, but this cancels out when computing the differences.)
To compute the pause times for a specific production record, use a correlated subquery; the total aggregate is needed to cope with zero/one/multiple matching pauses.
SELECT date,
item,
sum(strftime('%s', end) - strftime('%s', begin) -
(SELECT total(strftime('%s', end) - strftime('%s', begin))
FROM pause
WHERE pause.date = production.date
AND pause.begin >= production.begin
AND pause.end <= production.end)
) AS seconds
FROM production
GROUP BY date,
item

The best answer I found is:
SELECT
cast(
(
strftime('%s',time_arrived)-strftime('%s',time_departed)
) AS real
)/60/60 AS elapsed
FROM date AS t;
For aditional information check this blog article.