Kusto/KQL: summarize by time bucket AND count(string) column

Kusto/KQL: summarize by time bucket AND count(string) column - azure-data-explorer

I have a table of http responses including timestamp, service name and the http response code I want to query using KQL/Kusto.
My goal is to have a table that tells me "How many http responses of a certain type (2xx, 4xx etc) did a particular service have within the last 5 minutes over time"
I want to summarize the rows by a time bucket of 5min and the ResponseType (basically the response code class) as well - but I can't seem to make it work. When I add count(ResponseType) to the summarize clause, it returns the error message Function 'count' cannot be invoked in current context.
My KQL looks like this
InsightsMetrics
| extend Tags = parse_json(Tags)
| extend Responsecode = tostring(Tags.["code"])
| extend ResponseType = strcat(substring(Responsecode, 0, 1), "XX")
| extend Service = tostring(Tags.["service"])
| where TimeGenerated >= now(-4h)
| where Namespace == "prometheus"
| where Name contains "traefik_service_requests_total"
| project TimeGenerated, Responsecode, Service, ResponseType
| summarize by bin(TimeGenerated, 5m), ResponseType
which returns data like this:
| TimeGenerated | ResponseType | Service |
|---------------------|--------------|----------------------------------------------------------|
| 2020-10-01 10:25:00 | 3XX | prod-service-internal-50f0bab542c7d81ed22e#kubernetescrd |
| 2020-10-01 10:30:00 | 2XX | prod-service-internal-50f0bab542c7d81ed22e#kubernetescrd |
| 2020-10-01 10:30:00 | 2XX | prod-service-internal-50f0bab542c7d81ed22e#kubernetescrd |
| 2020-10-01 10:30:00 | 4XX | prod-service-internal-50f0bab542c7d81ed22e#kubernetescrd |
When I want something like this instead
| TimeGenerated | ResponseType | count(ResponseType) | Service |
|---------------------|--------------|---------------------|----------------------------------------------------------|
| 2020-10-01 10:25:00 | 3XX | 1 | prod-service-internal-50f0bab542c7d81ed22e#kubernetescrd |
| 2020-10-01 10:30:00 | 2XX | 2 | prod-service-internal-50f0bab542c7d81ed22e#kubernetescrd |
| 2020-10-01 10:30:00 | 4XX | 1 | prod-service-internal-50f0bab542c7d81ed22e#kubernetescrd |

All you have to do is replace
| summarize by bin(TimeGenerated, 5m), ResponseType
with
| summarize count() by bin(TimeGenerated, 5m), ResponseType, Service

Related

Remove duplicates based on multiple values in R or POWER BI

I have a data set, each line representing a "service visit" for customers. A customer might have between 0 and 5 service calls. If there isn't a service call for someone, the columns associated with a service call would all be empty.
+--------------+-------------------+-------------------+------------------------+---------------------+
| Project Name | Customer Name | Service Call.Name | Service Call Date Time | Service Call Status |
+--------------+-------------------+-------------------+------------------------+---------------------+
| OO-99999 | A | SC-001762 | 3/21/2022 7:00:00 PM | Completed |
| OO-99999 | A | SC-002323 | null | Completed |
| OO-99999 | A | SC-002357 | 10/3/2022 7:00:00 PM | 2nd Visit Scheduled |
| OO-88888 | B | SC-001260 | 2/1/2022 8:00:00 PM | Completed |
| OO-88888 | B | SC-002938 | 8/25/2022 7:00:00 PM | Scheduled |
| OO-55555 | C | SC-000957 | 12/27/2021 8:00:00 PM | Completed |
| OO-55555 | C | SC-001418 | 2/7/2022 4:30:00 PM | Completed |
| OO-55555 | C | SC-003007 | null | null |
| OO-66666 | D | SC-001626 | null | No Longer Required |
| OO-66666 | D | SC-002329 | 6/9/2022 7:00:00 PM | Completed |
| OO-66666 | D | SC-002538 | null | Completed |
| OO-66666 | D | SC-002932 | null | Call Reviewed |
| OO-66666 | D | SC-003350 | 9/29/2022 7:00:00 PM | Scheduled |
| OO-11111 | F | null | null | null |
+--------------+-------------------+-------------------+------------------------+---------------------+
My goal is to filter out duplicates. I only want one row per customer, but I want to keep a specific row. A duplicate only appears if someone has multiple service calls.
If someone has a service call (Service Call.Name not equal to null), and one of those has a service call status of something OTHER than "Completed" or "Not required", I want to keep that row. So for Customer A, I want the third row since the service call status is not "completed" or "Not required".
If someone has multiple service calls, like customer , and they are all "completed" or "Not required". I don't care which one I keep, as long as I only keep one.
If someone has one service call or no service call, there will be no duplicate of that person, so I want to keep that row.
EDIT
There were cases of duplicates I didn't realize I had, I've edited the data to show them.
For someone with more than one open service call like customer E, I only want to keep one of them. If there is a date for both, I want the latest date of the two. If one has a date and the other doesn't, i want the one with a date. If neither have a date, i don't care which is kept, but i only want one.
I am working in Power BI, but I have access to R and think that might be easier.

Here is a solution. duplicated will give what rows to keep by customer name and another logical index, created with %in%, the rows to keep by status.
dat <- read.table(text = '+--------------+---------------+-------------------+------------------------+---------------------+
| Project Name | Customer Name | Service Call.Name | Service Call Date Time | Service Call Status |
+--------------+---------------+-------------------+------------------------+---------------------+
| OO-99999 | A | SC-001762 | 3/21/2022 7:00:00 PM | Completed |
| OO-99999 | A | SC-002323 | null | Completed |
| OO-99999 | A | SC-002357 | 10/3/2022 7:00:00 PM | 2nd Visit Scheduled |
| OO-88888 | B | SC-001260 | 2/1/2022 8:00:00 PM | Completed |
| OO-88888 | B | SC-002938 | 8/25/2022 7:00:00 PM | Scheduled |
| OO-55555 | C | SC-000957 | 12/27/2021 8:00:00 PM | Completed |
| OO-55555 | C | SC-001418 | 2/7/2022 4:30:00 PM | Completed |
| OO-55555 | C | SC-003007 | null | null |
| OO-11111 | D | null | null | null |
+--------------+---------------+-------------------+------------------------+---------------------+
', header = TRUE, sep = "|", comment.char = "+", strip.white = TRUE, check.names = FALSE)
dat <- dat[-c(1, ncol(dat))]
not_wanted <- c("Completed", "Not required")
i <- dat[['Service Call Status']] %in% not_wanted
i <- ave(i, dat[['Customer Name']], FUN = \(k) {
if(all(k)) k[1] <- FALSE
!k
})
result <- dat[i,]
j <- ave(result[['Service Call Status']], result[['Customer Name']], FUN = duplicated)
result <- result[!as.logical(j), ]
result
#> Project Name Customer Name Service Call.Name Service Call Date Time Service Call Status
#> 3 OO-99999 A SC-002357 10/3/2022 7:00:00 PM 2nd Visit Scheduled
#> 5 OO-88888 B SC-002938 8/25/2022 7:00:00 PM Scheduled
#> 8 OO-55555 C SC-003007 null null
#> 9 OO-11111 D null null null
Created on 2022-10-26 with reprex v2.0.2

Default value for LAG function in MariaDB

I'm trying to build a view which allows me to track the difference between paid values at two consecutive month_ids. When a figure is missing however, that would be because it's the first entry and therefore has a paid amount of 0. At present, I'm using the below to represent the previous figure since the [,default] argument has not been implemented in MariaDB.
CASE WHEN (
NOT(policy_agent_month.policy_agent_month_id IS NOT NULL
AND LAG(days_paid, 1) OVER (PARTITION BY claim_id ORDER BY month_id ) IS NULL)) THEN
LAG(days_paid, 1) OVER ( PARTITION BY claim_id ORDER BY month_id)
ELSE
0
END
The problem I have with this is that I have about 30 variables which this function needs to be applied over and it makes my code unreadable and very clunky. Is there a better solution?

Why use WITH?
SELECT province, tot_pop,
tot_pop - COALESCE(
(LAG(tot_pop) OVER (ORDER BY tot_pop ASC)),
0) AS delta
FROM provinces
ORDER BY tot_pop asc;
+---------------------------+----------+---------+
| province | tot_pop | delta |
+---------------------------+----------+---------+
| Nunavut | 14585 | 14585 |
| Yukon | 21304 | 6719 |
| Northwest Territories | 24571 | 3267 |
| Prince Edward Island | 63071 | 38500 |
| Newfoundland and Labrador | 100761 | 37690 |
| New Brunswick | 332715 | 231954 |
| Nova Scotia | 471284 | 138569 |
| Saskatchewan | 622467 | 151183 |
| Manitoba | 772672 | 150205 |
| Alberta | 2481213 | 1708541 |
| British Columbia | 3287519 | 806306 |
| Quebec | 5321098 | 2033579 |
| Ontario | 10071458 | 4750360 |
+---------------------------+----------+---------+
13 rows in set (0.00 sec)
However, it is not cheap (at least in MySQL 8.0);
the table has 13 rows, yet
FLUSH STATUS;
SELECT ...
SHOW SESSION STATUS LIKE 'Handler%';
MySQL 8.0:
+----------------------------+-------+
| Variable_name | Value |
+----------------------------+-------+
| Handler_read_rnd | 89 |
| Handler_read_rnd_next | 52 |
| Handler_write | 26 |
(and others)
MariaDB 10.3:
| Handler_read_rnd | 77 |
| Handler_read_rnd_next | 42 |
| Handler_tmp_write | 13 |
| Handler_update | 13 |

You can use a CTE (Common Table Expression) in MariaDB 10.2+ to pre-compute frequently used expressions and name them for later use:
with
x as ( -- first we compute the CTE that we name "x"
select
*,
coalesce(
LAG(days_paid, 1) OVER (PARTITION BY claim_id ORDER BY month_id),
123456
) as prev_month -- this expression gets the name "prev_month"
from my_table -- or a simple/complex join here
)
select -- now the main query
prev_month
from x
... -- rest of your query here where "prev_month" is computed.
In the main query prev_month has the lag value, or the default value 123456 when it's null.

Calculation of Battery Consumption of each running mobile application

Is it possible to find out how much each mobile application consumes the battery per day (using R language) , where I have data collection of the following fields
record_id ,
date_time,
application_name,
battery_level,
battery_status
battery_level (It is a number represents the available percentage of the battery)
battery_status ( status of the battery : charging , discharging , full)
This calculation is based on the collected data.
example of such data :
+-----------+------------------+---------------------+---------------+----------------+
| record_id | application_name | date_time | battery_level | battery_status |
+-----------+------------------+---------------------+---------------+----------------+
| 473849 | viber | 2015-09-01 21:34:01 | 7 | Charging |
| 473850 | watsup | 2015-09-01 21:34:01 | 7 | Charging |
| 473851 | AccuWeather | 2015-09-01 21:34:01 | 7 | Charging |
+-----------+------------------+---------------------+---------------+----------------+

as I understood that it is not possible to calculate battery Consumption of
each running mobile application using data collected in my first post.
Let us have another data collection .
assuming that we have the following data ,
cpu usage per each running application and
memory usage per each running application
as the following
+-----------+------------------+---------------------+---------------------------------+------------------------------------+
| record_id | application_name | date_time | cpu_usage_per_app_in_percentage | memory_usage_per_app_in_percentage |
+-----------+------------------+---------------------+---------------------------------+------------------------------------+
| 473849 | viber | 2015-09-06 19:23:13 | 5 | 2 |
| 473850 | watsup | 2015-09-06 19:23:13 | 9 | 2 |
| 473851 | AccuWeather | 2015-09-06 19:23:13 | 8 | 4 |
| 473980 | viber | 2015-09-06 19:23:14 | 4 | 1 |
| 474254 | watsup | 2015-09-06 19:23:14 | 9 | 1 |
| 474323 | AccuWeather | 2015-09-06 19:23:14 | 9 | 2 |
| 474533 | viber | 2015-09-06 19:23:15 | 5 | 2 |
| 474536 | watsup | 2015-09-06 19:23:15 | 8 | 3 |
| 474537 | AccuWeather | 2015-09-06 19:23:15 | 5 | 3 |
| 474538 | calendar | 2015-09-06 19:23:15 | 7 | 3 |
+-----------+------------------+---------------------+---------------------------------+------------------------------------+
you can suggest any other way of data collection , the key question is that is it possible to make calculation of Battery Consumption of earch running mobile application ? if so how and what the data to be collected?

Select single row per unique field value with SQL Developer

I have thousands of rows of data, a segment of which looks like:
+-------------+-----------+-------+
| Customer ID | Company | Sales |
+-------------+-----------+-------+
| 45678293 | Sears | 45 |
| 01928573 | Walmart | 6 |
| 29385068 | Fortinoes | 2 |
| 49582015 | Walmart | 1 |
| 49582015 | Joe's | 1 |
| 19285740 | Target | 56 |
| 39506783 | Target | 4 |
| 39506783 | H&M | 4 |
+-------------+-----------+-------+
In every case that a customer ID occurs more than once, the value in 'Sales' is also the same but the value in 'Company' is different (this is true throughout the entire table). I need for each value in 'Customer ID to only appear once, so I need a single row for each customer ID.
In other words, I'd like for the above table to look like:
+-------------+-----------+-------+
| Customer ID | Company | Sales |
+-------------+-----------+-------+
| 45678293 | Sears | 45 |
| 01928573 | Walmart | 6 |
| 29385068 | Fortinoes | 2 |
| 49582015 | Walmart | 1 |
| 19285740 | Target | 56 |
| 39506783 | Target | 4 |
+-------------+-----------+-------+
If anyone knows how I can go about doing this, I'd much appreciate some help.
Thanks!

Well it would have been helpful, if you have put your sql generate that data.
but it might go something like;
SELECT customer_id, Max(Company) as company, Count(sales.*) From Customers <your joins and where clause> GROUP BY customer_id
Assumes; there are many company and picks out the most number of occurance and the sales data to be in a different table.
Hope this helps.

Calculating Duration in org mode table

I'm trying to figure out how to to use org-mode to calculate the duration between two time points, however, whilst I figured out how to do it for two separate dates, when I add in the time component, it gives an answer, but I'd rather have the answer in
XX days, xx hours, xx minutes
| Start | End | Duration |
|------------------------+------------------------+----------|
| <2013-07-16 Tue 15:15> | <2013-07-17 Wed 11:15> | 0.833333 |
| | | 0 |
#+TBLFM: $3=(date(<$2>)-date(<$1>))

You may use the T flag to use the form HH:MM[:SS]. Example:
| Start | End | Days | HH:MM:SS |
|------------------------+------------------------+----------+----------|
| <2013-07-15 Tue 10:15> | <2013-07-17 Wed 11:15> | 2.041667 | 49:00:00 |
| | | 0 | 00:00:00 |
#+TBLFM: $3=date(<$2>)-date(<$1>)::$4=60*60*24*$3;T

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Kusto/KQL: summarize by time bucket AND count(string) column - azure-data-explorer

All you have to do is replace | summarize by bin(TimeGenerated, 5m), ResponseType with | summarize count() by bin(TimeGenerated, 5m), ResponseType, Service

Related

Remove duplicates based on multiple values in R or POWER BI

Default value for LAG function in MariaDB

Calculation of Battery Consumption of each running mobile application

Select single row per unique field value with SQL Developer

Calculating Duration in org mode table

Categories

Resources