Kusto - Last row by timestamp for every series - azure-data-explorer

I have a table where messages are logged, for every operation there are several messages with timestamp.
I need to get the last message for every operation_id.
Example data:
timestamp | operation_id | message
---------------------------|--------------------------------------------------------
10/2/2019, 10:00:10.000 AM | 1 | message (last msg for this operation id)
10/2/2019, 10:00:00.000 AM | 1 | message
10/2/2019, 10:00:03.000 AM | 2 | message (last msg for this operation id)
10/2/2019, 10:00:00.000 AM | 3 | message
10/2/2019, 10:00:00.000 AM | 2 | message
10/2/2019, 10:00:15.000 AM | 3 | message (last msg for this operation id)
Desired output:
timestamp | operation_id | message
---------------------------|--------------------------------------------------------
10/2/2019, 10:00:10.000 AM | 1 | message (last msg for this operation id)
10/2/2019, 10:00:03.000 AM | 2 | message (last msg for this operation id)
10/2/2019, 10:00:15.000 AM | 3 | message (last msg for this operation id)

Take a look at the aggregation function arg_max()
Note that the applicable examples are in the arg_min() doc...

Related

Why SHOW ERRORS or SHOW WARNINGS lists only the very last error/warning?

I am using the SHOW ERRORS and SHOW WARNIGS statments (on MariaDB 10.10.2) and regardless of the max_error_count is in its default 64 value, the statements above only list the very last error/warning.
Question
How can I list all the last recent errors/warning according the max_error_count variable?
The SQL statement SHOW ERRORS/WARNINGS only returns error or warnings for the last statement which failed or produced a warning. They can be retrieved until another statement produced an error/warning.
If a statement produced more than one error/warnings, then the SHOW ERRORS/WARNINGS return up to max_error warnings.
Example:
delimiter $$
CREATE PROCEDURE p1() BEGIN DROP TABLE whichdoesnotexist; END $$
CREATE PROCEDURE p2() BEGIN CALL p1(); END $$
CREATE PROCEDURE p3() BEGIN CALL p2(); END $$
delimiter ;
SET ##max_error_count=2;
CALL p3();
SHOW WARNIINGS;
+-------+------+----------------------------------------+
| Level | Code | Message |
+-------+------+----------------------------------------+
| Error | 1051 | Unknown table 'test.whichdoesnotexist' |
| Note | 4094 | At line 2 in test.p1 |
+-------+------+----------------------------------------+
SET ##max_error_count=4;
CALL p3();
SHOW WARNINGS;
+-------+------+----------------------------------------+
| Level | Code | Message |
+-------+------+----------------------------------------+
| Error | 1051 | Unknown table 'test.whichdoesnotexist' |
| Note | 4094 | At line 2 in test.p1 |
| Note | 4094 | At line 2 in test.p2 |
| Note | 4094 | At line 2 in test.p3 |
+-------+------+----------------------------------------+
# This will not clear warning/errors
SELECT "1","2","3";
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
| 1 | 2 | 3 |
+---+---+---+
SHOW WARNINGS;
+-------+------+----------------------------------------+
| Level | Code | Message |
+-------+------+----------------------------------------+
| Error | 1051 | Unknown table 'test.whichdoesnotexist' |
| Note | 4094 | At line 2 in test.p1 |
| Note | 4094 | At line 2 in test.p2 |
| Note | 4094 | At line 2 in test.p3 |
+-------+------+----------------------------------------+
SHOW ERRORS and SHOW WARNINGS only show errors/warnings from the most recent statement in the same session. They will never show anything from previous statements or other sessions. max_error_count will reduce the number shown, but never cause anything else to be shown.
MariaDB [test]> select version();
+-----------------------------------------+
| version() |
+-----------------------------------------+
| 10.10.2-MariaDB-1:10.10.2+maria~ubu2204 |
+-----------------------------------------+
1 row in set (0.001 sec)
MariaDB [test]> select date(d) from (select '' d union all select 20220229) d;
+---------+
| date(d) |
+---------+
| NULL |
| NULL |
+---------+
2 rows in set, 2 warnings (0.002 sec)
MariaDB [test]> show warnings;
+---------+------+--------------------------------------+
| Level | Code | Message |
+---------+------+--------------------------------------+
| Warning | 1292 | Incorrect datetime value: '' |
| Warning | 1292 | Incorrect datetime value: '20220229' |
+---------+------+--------------------------------------+
2 rows in set (0.000 sec)
MariaDB [test]> set session max_error_count=1;
Query OK, 0 rows affected (0.001 sec)
MariaDB [test]> select date(d) from (select '' d union all select 20220229) d;
+---------+
| date(d) |
+---------+
| NULL |
| NULL |
+---------+
2 rows in set, 2 warnings (0.001 sec)
MariaDB [test]> show warnings;
+---------+------+------------------------------+
| Level | Code | Message |
+---------+------+------------------------------+
| Warning | 1292 | Incorrect datetime value: '' |
+---------+------+------------------------------+
1 row in set (0.000 sec)

Grouping similar column string values

I have a table in Azure Log Analytics where messages are logged.
There aren't many distinct messages actually, but in every one there is a variable part like an user id or a timestamp.
I need to count the distinct message types grouped by one hour intervals, ignoring the variable elements in every message (UUID and timestamp in this case).
I don't know all the message types.
I cannot touch anything else, I am forced to work with this table.
Example data:
timestamp | message
----------|--------------------------------------------------------
| Message type A for user id 993215f6-c42a-4957-bd55-78d71306a8d0
| Message type A for user id 60e7d02c-770a-4641-b379-6bd33fcd563c
| Message type A for user id 5bf7646c-092b-4e20-ba43-de7fe01010ea
| Another message type containing timestamp hh:mm:ss
| Another message type containing timestamp hh:mm:ss
| Another message type containing timestamp hh:mm:ss
| Type C message <variable_string>
Desired output:
timestamp | distinct_message | count
----------------------------|--------------------------------------------|------
10/2/2019, 10:00:00.000 AM | Message type A for user id | 25
10/2/2019, 10:00:00.000 AM | Another message type containing timestamp | 13
10/2/2019, 10:00:00.000 AM | Type C message | 0
10/2/2019, 11:00:00.000 AM | Message type A for user id | 4
10/2/2019, 11:00:00.000 AM | Another message type containing timestamp | 6
10/2/2019, 11:00:00.000 AM | Type C message | 2
This is what I've managed to create, but my knowledge of KQL is quite limited.
let regex_uid = "[[:xdigit:]]+-[[:xdigit:]]+-[[:xdigit:]]+-[[:xdigit:]]+-[[:xdigit:]]+";
traces
| where timestamp > ago(1d)
| extend message = replace(regex_uid, "", message)
| extend message = replace("[0-9]+", "", message)
| extend message = iif(message startswith "Type C message", "Type C message", message )
| project timestamp, message, operation_Name
| summarize count(operation_Name) by bin(timestamp, 1h), message
Is there any better way to do this?
another option for you to consider is using the reduce operator: https://learn.microsoft.com/en-us/azure/kusto/query/reduceoperator
the output won't be identical to the one in your question. though if I understand your intention correctly, it follows the same principles.

How to obtain distinct values based on another column in the same table?

I'm not sure how to word the title properly so sorry if it wasn't clear at first.
What I want to do is to find users that have logged into a specific page, but not the other.
The table I have looks like this:
Users_Logins
------------------------------------------------------
| IDLogin | Username | Page | Date | Hour |
|---------|----------|-------|------------|----------|
| 1 | User_1 | Url_1 | 2019-05-11 | 11:02:51 |
| 2 | User_1 | Url_2 | 2019-05-11 | 14:16:21 |
| 3 | User_2 | Url_1 | 2019-05-12 | 08:59:48 |
| 4 | User_2 | Url_1 | 2019-05-12 | 16:36:27 |
| ... | ... | ... | ... | ... |
------------------------------------------------------
So as you can see, User 1 logged into Url 1 and 2, but User 2 logged into Url 1 only.
How should I go about finding users that logged into Url 1, but never logged into Url 2 during a certain period of time?
Thanks in advance!
I will try to improve the title of your question later, but for the time being, this is how I accomplished what you are asking for:
Query:
select distinct username from User_Logins
where page = 'Url_1'
and username not in
(select username from User_Logins
where Page = 'Url_2')
and date BETWEEN '2019-05-12' AND '2019-05-12'
and hour BETWEEN '00:00:00' AND '12:00:00';
Returns:
User_2
Comments:
I basically used a sub query to filter out the usernames you don't care about. :)
The time range is getting only 1 result, which you can test by removing the "distinct" in the first line of the query. If you then remove the time range from the query, you'll get 2 results.
You can do it with group by username and apply the conditions in a HAVING clause:
select username
from User_Logins
where
date between '..........' and '..........'
and
hour between '..........' and '..........';
group by username
having
sum(page = 'Url_1') > 0
and
sum(page = 'Url_2') = 0
Replace the dots with the date/time intervals you want.

How is availability zone list order determined by the nova api in openstack?

I want to change the default option for availability zone in my openstack setup in horizon. However, I am having trouble finding out what determines the order of the availability zones as returned by the nova api. For example, running openstack availability zone list I get:
+--------------+-------------+
| Zone Name | Zone Status |
+--------------+-------------+
| zone2 | available |
| zone1 | available |
| internal | available |
| zone3 | available |
+--------------+-------------+
which is the same order as in horizon's dropdown box. However, querying the database directly, I get:
mysql> select * from aggregate_metadata;
+---------------------+------------+------------+----+--------------+-------------------+--------------+---------+
| created_at | updated_at | deleted_at | id | aggregate_id | key | value | deleted |
+---------------------+------------+------------+----+--------------+-------------------+--------------+---------+
| 2015-06-12 08:43:07 | NULL | NULL | 1 | 1 | availability_zone | zone1 | 0 |
| 2015-06-12 08:43:08 | NULL | NULL | 2 | 2 | availability_zone | zone2 | 0 |
| 2015-10-26 05:30:15 | NULL | NULL | 3 | 3 | availability_zone | zone3 | 0 |
+---------------------+------------+------------+----+--------------+-------------------+--------------+---------+
3 rows in set (0.00 sec)
Obviously, the openstack api is doing some sorting before returning the result... however, I can't figure out how it is being sorted nor how I could control the sorting.
get_availability_zones is the function used by nova api to collect list of availability zones.
This function gets list of available services(which is sorted based on the id) ,adds availability zone name is added to those services.
Since service list is the first step it's id defines the order and not the zone name.
The sort order can be modified in different ways based on the requirement.
Sort the order at frontend (horizon)
Modify this line with
ng-options="zone.value as zone.label for zone in model.availabilityZones | orderBy:'value'"
Sort the order at backend (nova-api)
Add available_zones.sort()not_available_zones.sort() before return statements in get_availability_zones function

Performance issue on CLOB column

I’m facing one issue with a table which has CLOB column.
The table is just a 15column table with one column as CLOB.
When i do SELECT on the table excluding CLOB column, it take only 15min, but if i include this column the SELECT query runs for 2hrs.
Have check the plan and found both the query with and without COLUM uses same Plan.
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
---------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 330K| 61M| 147K (1)| 00:29:34 | | |
| 1 | PARTITION RANGE ALL | | 330K| 61M| 147K (1)| 00:29:34 | 1 | 50 |
| 2 | TABLE ACCESS BY LOCAL INDEX ROWID| CC_CONSUMER_EV_PRFL | 330K| 61M| 147K (1)| 00:29:34 | 1 | 50 |
|* 3 | INDEX RANGE SCAN | CC_CON_EV_P_EV_TYPE_BTIDX | 337K| | 811 (1)| 00:00:10 | 1 | 50 |
Below are the stats i collected.
Stats Without CLOB Column With CLOB Column
recursive calls 0 1
db block gets 0 0
consistent gets 1374615 3131269
physical reads 103874 1042358
redo size 0 0
bytes sent via SQL*Net to client 449499347 3209044367
bytes received via SQL*Net from client 1148445 1288482930
SQL*Net roundtrips to/from client 104373 2215166
sorts (memory)
sorts (disk)
rows processed 1565567 1565567
I'm planing to perform below, is it worth to try?
1) Gather stats on the table and retry
2) compress the table and retry

Resources