Question using distinct and grouping on WIndows Virtual Desktop session statistics - azure-data-explorer

I need some help creating a Kusto query for filtering and grouping Windows Virtual Desktop statistics.
What I need: A chart that shows me the amount of total sessions of users logged in to a WVD Host Pool.
Data available: WVD logs this info every x seconds to log analytics for each host. But that interval is not exactly "every x seconds", but at least once every 3 minutes.
So I made this query for now:
WVDAgentHealthStatus
| where TimeGenerated > ago(3m)
| project SessionHostName, TimeGenerated, ActiveSessions, InactiveSessions, Totalsessions=(toint(ActiveSessions) + toint(InactiveSessions))
That makes results like this:
As you can see, some hosts are reported twice, some three times.
I need help with:
How to make this query usable in a chart, so that it shows "the amount of sessions per host" in steps of every 3 minutes, but the chart should show this data for the last 8 hours.
I guess "where TimeGenerated" has to be "> ago(8h)", but it needs to group the data in sets of 3 minutes, then get distinct data per host.
I have no idea how to do this. I'm not that good in Kusto. Can anyone help me out?

You need to use the summarize operator:
WVDAgentHealthStatus
| where TimeGenerated > ago(8h)
| summarize arg_max(TimeGenerated, ActiveSessions) by bin(TimeGenerated, 3m), SessionHostName

Related

What is the best strategy for optimizing aggregation queries over a long period of time/dataset in Azure Data Explorer (Kusto)?

I have some queries that look at aggregated data over a long period of time (180 days) for data that is per second (example query below). The table's hot cache is 31 days so the queries can take over a minute to return and this is not acceptable for the dashboards I want to display them on. What would be recommended optimization strategies? My thoughts so far is to either use an update policy to push the data for these tags into a separate table with a hot cache of 180 days or to use a materialized view.
raw_table
| where TimeStamp between (now(-180d) .. now()) and TagName in ("Tag1","Tag2")
| extend Date = startofday(TimeStamp)
| summarize Value1=max(Value) by Date,TagName
| summarize Value1=sum(Value1) by Date
| project TagName="AggregatedData",Date,Value
My thoughts so far is to either use an update policy to push the data for these tags into a separate table with a hot cache of 180 days or to use a materialized view.
both options you mentioned are appropriate (even a combination of both, if required)

BigQuery to Data Studio : Show reliable COUNT DISTINCT regardless of the selected period

in my BigQuery project I store event data integrated from Firebase. The granularity and dimension is such that trying to present raw data in Data Studio quickly makes the report become VERY slow (1-2 min per page/interaction).
I then started to think how I could create pre-aggregated tables in BigQuery to speed everything up, but quickly realised COUNT DISTINCT metrics would be a problem with this approach.
Let me explain:
SELECT user, date
FROM UNNEST([
STRUCT("Adam" AS user, "20190923" AS date),
("Bob", "20190923"),
("Carl", "20190923"),
("Adam", "20190924"),
("Bob", "20190924"),
("Adam", "20190925"),
("Carl", "20190925"),
("Bob", "20190926")
]) AS website_visits;
+------+----------+
| User | Date |
+------+----------+
| Adam | 20190923 |
| Bob | 20190923 |
| Carl | 20190923 |
| Adam | 20190924 |
| Bob | 20190924 |
| Adam | 20190925 |
| Carl | 20190925 |
| Bob | 20190926 |
+------+----------+
The above is a table of website visits.
Clearly, creating a pre-aggregated table like
SELECT date, COUNT(DISTINCT user) FROM website_visits GROUP BY date
has the limitation that the count cannot be aggregated further (or even less, dinamically) to get a total, as doing a SUM would return 8 unique users which is not correct, there are only 3 unique users.
In BigQuery, this is fixed by using HLL_COUNT, which despite the approximation works ok for me.
Now to the big question:
How to do the same so that the result is displayable in Data Studio????
HLL_COUNT.EXTRACT is not available as function in there, and in the reporting I always have to keep in mind that the date range is set by the user however (s)he likes so it's not possible to store a pre-aggregated result for ALL cases...
EDIT 1: APPROX_COUNT_DISTINCT
As per answer from Bobbylank, I tried to use APPROX_COUNT_DISTINCT.
However I found that this just seems to move the issue down the line. My fault for not explaining what's over there.
Despite being performances acceptable it does not seem possible to me to blend a data source with this calculated metric.
Example: After displaying the amount of unique users in the selected period (which now works), I'm also trying to display Average Revenue Per User (ARPU) in Data Studio like Firebase does.
To do this, I have to SUM(REVENUE) / APPROX_COUNT_DISTINCT(USER)
Clearly, REVENUE works ok with pre-aggregation and is available in the raw data. I tried then to blend the raw data with a table containing just user visits. However APPROX_COUNT_DISTINCT can't be used in the blended data definition as calculated metrics are not allowed.
Even trying to use the USER field as a metric with Count Distinct aggregation, despite returning the correct figures when showing revenue and user count separately, when I try to divide them the problem becomes aggregation (apply SUM or AVG to the field and basically the result will be AVG(REVENUE/USERS) for each day).
I also then tried to store REVENUE directly in the visits table, but was reminded by Data Studio that I can't create calculated metrics that I can't mix dimensions and metrics in a calculated field.
APPROX_COUNT_DISTINCT might be more performance friendly for you?
https://support.google.com/datastudio/answer/9189108?hl=en
Otherwise the only way I can think would be to pre-calculate several metrics (e.g. unique users on that day, 7-day cumulative, 14-day, etc.) as your customer require for each single day.
Or you could provide a 2 page report with both of these methods with the caveat that the first can be used over a time period but will be much slower?

Getting total request count to a cosmosDb using Log Analytics logs

In Azure you can see how many requests have been made against a CosmosDB in the overview tab of the cosmosDb. I want to get that same number (total requests) using Log Analytics Diagnostic logs, but I am having trouble knowing which logs to count, since there are more logs than total requests.
Around March I used a logic that if a log had a full self-link (with database id/name and collection id/name) in requestResourceId_s field then I would count it. This seemed to work and the numbers added up, but when I revisited this a while back I noticed this doesn't work anymore. Then I tried filtering the logs with collectionName_s != "", requestLength_s != "0", requestCharge_s != "0.000000", using the distinct operator on the activityId and combining the mentioned filters in different combinations. But it always returns the wrong numbers and I cant seem to find the Total Request Count.
AzureDiagnostics
| extend requestDatabaseId = extract("(^(/dbs/.*?)/)", 1, requestResourceId_s)
| extend requestCollectionId = extract("((/colls/.*?)/)", 1, requestResourceId_s)
| where requestDatabaseId != "" and requestCollectionId != ""
This is the main point of the query I used to use to get the Total request count. For instance, it will count a log that has /dbs/master/colls/master/docs" in requestResourceId_s
For instance, if I see there have been 97 total request, with my old logic there are now 326 logs that get counted.
Any help is appreciated.
Welcome to Stack Overflow.
AFAIK you should use the below Kusto query to get the total number of requests made.
AzureMetrics | where MetricName == "TotalRequests"
Pre-requisite for the above thing to work is to turn on logging using diagnostic setting as explained in this document. Make sure you tick the 'Requests' box under Metric section as highlighted in below screenshot.
Please refer to this document to know what all metrics are currently supported. Should supported metrics for any Azure resource changes in future then the information on this would probably be updated.
Hope this helps!! Cheers!!

Analytics data deleted from the Azure applicationinsights

I checked that when i try to query on events data the total record decreased automatically. I want to know that when data is deleted from applicationinsights. My query is just simple count query. I also noticed that some events timestamp also updated. I searched for blog etc but nothing found
customEvents | count
Unless you explicitly set a time range in the query directly, like
customEvents | where timestamp >= ago(7d) | count
then the analytics website is going to automatically apply a last 24 hours time restriction automatically. (look in the upper right corner, there is a dropdown that will either say "set in query" or "last 24 hours" or other choices)

want to add record vertically in database

I am making attendance management system in which I have to record the time in and time out of an employee several times a day.
This can easily be achieved if I add a new row for every timein and time out.
But the problem is that the number of employees is very high, so
I want to add timein and timeout horizontally rather than vertically.
How that can be done?
e.g
id | Barcode | date | time in | time out | time in | time out| .......
| total time
I agree with the Comments that you should keep the data in a vertical fashion because it is a better design and utilizes the "relational" aspect of the database. Based on your description of the system having approximately 90,000 employees and may 5-10 rows per employee in the table (90k * 10 ).... you should consider having an Index on the employee id field (FK) so that your queries are more efficient. By using an Index, you shoudl be able to store many records and not see any speed issues.

Resources