Grouping similar column string values

Grouping similar column string values - azure-data-explorer

I have a table in Azure Log Analytics where messages are logged.
There aren't many distinct messages actually, but in every one there is a variable part like an user id or a timestamp.
I need to count the distinct message types grouped by one hour intervals, ignoring the variable elements in every message (UUID and timestamp in this case).
I don't know all the message types.
I cannot touch anything else, I am forced to work with this table.
Example data:
timestamp | message
----------|--------------------------------------------------------
| Message type A for user id 993215f6-c42a-4957-bd55-78d71306a8d0
| Message type A for user id 60e7d02c-770a-4641-b379-6bd33fcd563c
| Message type A for user id 5bf7646c-092b-4e20-ba43-de7fe01010ea
| Another message type containing timestamp hh:mm:ss
| Another message type containing timestamp hh:mm:ss
| Another message type containing timestamp hh:mm:ss
| Type C message <variable_string>
Desired output:
timestamp | distinct_message | count
----------------------------|--------------------------------------------|------
10/2/2019, 10:00:00.000 AM | Message type A for user id | 25
10/2/2019, 10:00:00.000 AM | Another message type containing timestamp | 13
10/2/2019, 10:00:00.000 AM | Type C message | 0
10/2/2019, 11:00:00.000 AM | Message type A for user id | 4
10/2/2019, 11:00:00.000 AM | Another message type containing timestamp | 6
10/2/2019, 11:00:00.000 AM | Type C message | 2
This is what I've managed to create, but my knowledge of KQL is quite limited.
let regex_uid = "[[:xdigit:]]+-[[:xdigit:]]+-[[:xdigit:]]+-[[:xdigit:]]+-[[:xdigit:]]+";
traces
| where timestamp > ago(1d)
| extend message = replace(regex_uid, "", message)
| extend message = replace("[0-9]+", "", message)
| extend message = iif(message startswith "Type C message", "Type C message", message )
| project timestamp, message, operation_Name
| summarize count(operation_Name) by bin(timestamp, 1h), message
Is there any better way to do this?

another option for you to consider is using the reduce operator: https://learn.microsoft.com/en-us/azure/kusto/query/reduceoperator
the output won't be identical to the one in your question. though if I understand your intention correctly, it follows the same principles.

Related

Kusto query for grouping AppInsights messages

I need the messages in Azure AppInsights grouped by the existence of particular substrings in the messages and the counts of these messages.
At the end, here is what the grouping would look like
messages count
-------- -------
foomessages <say, 300>
barmessages <say, 450>
:
:
where
foomessages = All messages containing the substring "foo" etc.
How can I construct a query for this ?

datatable(log: string) [
"hello world",
"this is a test",
"this is a world test",
"another test"
]
| summarize
LogsWithWorld = countif(log has "world"),
LogsWithTest = countif(log has "test")
| project Result = pack_all()
| mv-expand Result
| extend Message = tostring(bag_keys(Result)[0])
| extend Count = tolong(Result[Message])
| project Message, Count
The produced result is:
| Message | Count |
|---------------|-------|
| LogsWithWorld | 2 |
| LogsWithTest | 3 |
|---------------|-------|

Kusto - Last row by timestamp for every series

I have a table where messages are logged, for every operation there are several messages with timestamp.
I need to get the last message for every operation_id.
Example data:
timestamp | operation_id | message
---------------------------|--------------------------------------------------------
10/2/2019, 10:00:10.000 AM | 1 | message (last msg for this operation id)
10/2/2019, 10:00:00.000 AM | 1 | message
10/2/2019, 10:00:03.000 AM | 2 | message (last msg for this operation id)
10/2/2019, 10:00:00.000 AM | 3 | message
10/2/2019, 10:00:00.000 AM | 2 | message
10/2/2019, 10:00:15.000 AM | 3 | message (last msg for this operation id)
Desired output:
timestamp | operation_id | message
---------------------------|--------------------------------------------------------
10/2/2019, 10:00:10.000 AM | 1 | message (last msg for this operation id)
10/2/2019, 10:00:03.000 AM | 2 | message (last msg for this operation id)
10/2/2019, 10:00:15.000 AM | 3 | message (last msg for this operation id)

Take a look at the aggregation function arg_max()
Note that the applicable examples are in the arg_min() doc...

I want to find the day difference between 2 date column in azure app insight?

We have a log file where we store the searches happening on our platform. Now there is a departure date and I want to find the searches where departure date is after 330 days from today.
I am trying to run the query to find the difference between departure date column and logtime(entry time of the event into log). But getting the below error:
Query could not be parsed at 'datetime("departureDate")' on line [5,54]
Token: datetime("departureDate")
Line: 5
Position: 54
Date format of departure date is mm/dd/yyyy and logtime format is typical datetime format of app insight.
Query that I am running is below:
customEvents
| where name == "SearchLog"
| extend departureDate = tostring(customDimensions.departureDate)
| extend logTime = tostring(customDimensions.logTime)
| where datetime_diff('day',datetime("departureDate"),datetime("logTime")) > 200
As suggested I ran the below query but now I am getting 0 results but there is data that satisfy the given criteria.
customEvents
| where name == "SearchLog"
| extend departureDate = tostring(customDimensions.departureDate)
| extend logTime = tostring(customDimensions.logTime)
| where datetime_diff('day',todatetime(departureDate),todatetime(logTime)) > 200
Example:
departureDate
04/09/2020
logTime
8/13/2019 8:45:39 AM -04:00
I also tried the below query to check whether data format is supported or not and it gave correct response.
customEvents
| project datetime_diff('day', datetime('04/30/2020'),datetime('8/13/2019 8:25:51 AM -04:00'))

Please use the below query. Use todatetime statement to convert string to datetime
customEvents
| where name == "SearchLog"
| extend departureDate = tostring(customDimensions.departureDate)
| extend logTime = tostring(customDimensions.logTime)
| where datetime_diff('day',todatetime(departureDate),todatetime(logTime)) > 200

The double quotes inside datetime operator in where clause should be removed.
Your code should look like:
where datetime_diff('day',datetime(departureDate),datetime(logTime)) > 200

How to obtain distinct values based on another column in the same table?

I'm not sure how to word the title properly so sorry if it wasn't clear at first.
What I want to do is to find users that have logged into a specific page, but not the other.
The table I have looks like this:
Users_Logins
------------------------------------------------------
| IDLogin | Username | Page | Date | Hour |
|---------|----------|-------|------------|----------|
| 1 | User_1 | Url_1 | 2019-05-11 | 11:02:51 |
| 2 | User_1 | Url_2 | 2019-05-11 | 14:16:21 |
| 3 | User_2 | Url_1 | 2019-05-12 | 08:59:48 |
| 4 | User_2 | Url_1 | 2019-05-12 | 16:36:27 |
| ... | ... | ... | ... | ... |
------------------------------------------------------
So as you can see, User 1 logged into Url 1 and 2, but User 2 logged into Url 1 only.
How should I go about finding users that logged into Url 1, but never logged into Url 2 during a certain period of time?
Thanks in advance!

I will try to improve the title of your question later, but for the time being, this is how I accomplished what you are asking for:
Query:
select distinct username from User_Logins
where page = 'Url_1'
and username not in
(select username from User_Logins
where Page = 'Url_2')
and date BETWEEN '2019-05-12' AND '2019-05-12'
and hour BETWEEN '00:00:00' AND '12:00:00';
Returns:
User_2
Comments:
I basically used a sub query to filter out the usernames you don't care about. :)
The time range is getting only 1 result, which you can test by removing the "distinct" in the first line of the query. If you then remove the time range from the query, you'll get 2 results.

You can do it with group by username and apply the conditions in a HAVING clause:
select username
from User_Logins
where
date between '..........' and '..........'
and
hour between '..........' and '..........';
group by username
having
sum(page = 'Url_1') > 0
and
sum(page = 'Url_2') = 0
Replace the dots with the date/time intervals you want.

Sumtotal in ReportViewer

+----------+------------+------+------+--------------+---------+---------+
| | SUBJ | MIN | MAX | RESULT | STATUS | PERCENT |
| +------------+------+------+--------------+---------+---------+
| | Subj1 | 35 | 100 | 13 | FAIL | 13.00% |
|EXAM NAME | Subj2 | 35 | 100 | 63 | PASS | 63.00% |
| | Subj3 | 35 | 100 | 35 | PASS | 35.00% |
| +------------+------+------+--------------+---------+---------+
| | Total | 105 | 300 | 111 | PASS | 37.00% |
+----------+------------+------+------+--------------+---------+---------+
This is my report viewer report format.The SubTotal row counts the
total of all the above column.Every thing is fine. But in the status
column its showing Pass. I want it to show fail if there is single
fail in the status column. I am generating Status if Result < Min then
it is fail or else it is pass. Now how to change the SubTotal row
below depending upon the condition. And is there any way to show the
Subtotal row directly from database. Any suggestion.

The easiest way to do this would be to use custom code (right-click non-display area of report, choose Properties and click the Code tab) - calculate the pass/fail score in the detail, display it in the group footer and reset it in the group header:
Dim PassFail As String
// Reset Pass or Fail status in group header
Public Function ResetAndDisplayStatusTitle() AS String
PassFail = "PASS" // Initialise status to pass
ResetAndDisplayStatusTitle = "Status"
End Function
// Calculate pass/fail on each detail row and remember any fails
Public Function CalculatePassFail() As String
Dim ThisResult As String
// Calculate whether this result is pass or fail
If Fields!Result.Value < Fields!Min.Value Then
ThisResult = "FAIL"
Else
ThisResult ="PASS"
End If
// Remember any failure as overall failure
If ThisResult = "FAIL" Then PassFail = "FAIL"
CalculatePassFail = ThisResult
End Function
Then you tie in the custom code to your cells in your table as follows:
In the value for the status column in your group header you put:
=Code.ResetAndDisplayStatusTitle()
In the value for the status column in the detail row you put:
=Code.CalculatePassFail()
In the value for the status column in the group footer you put:
=Code.PassFail
With respect to getting the subtotal row from the database directly from the database, there are a couple of ways depending on what result you are after.
Join the detail row to a subtotalling row in your SQL (so that the subtotal fields appear on every row in the dataset) and use those fields.
Again, use custom code (but this is probably overly complicated for subtotalling)
However, these tricks are only for strange circumstances and in general the normal out-of-the-box subtotalling can be tweaked to give the result you are after. If there is something specific you want to know, it is probably best to explain the problem in a separate question so that issue can be dealt with individually.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Grouping similar column string values - azure-data-explorer

another option for you to consider is using the reduce operator: https://learn.microsoft.com/en-us/azure/kusto/query/reduceoperator the output won't be identical to the one in your question. though if I understand your intention correctly, it follows the same principles.

Related

Kusto query for grouping AppInsights messages

Kusto - Last row by timestamp for every series

I want to find the day difference between 2 date column in azure app insight?

How to obtain distinct values based on another column in the same table?

Sumtotal in ReportViewer

Categories

Resources