Application Insights not logging Questions and Answers from QNA Maker bot - azure-application-insights

When running the query, I was originally getting the Questions, Answers and Score of the questions asked to my bot. However, about 10 days ago it stopped logging.
I am still seeing activity in the application insights of the QNA Maker, but the "Custom Domains" where Questions, Answers and Score are logged is not showing up.
Does anyone know how to fix this?

I think I figured it out:
traces
| where customDimensions contains "question"
| extend question = tostring(customDimensions['Question'])
| extend answer = tostring(customDimensions['Answer'])
| extend score = tostring(customDimensions['Score'])
| project timestamp, question, answer, score
For some reason the suggested query in the docs wasn't working for me. But when I simplified it as above, it worked:
requests
| where url endswith "generateAnswer"
| project timestamp, id, url, resultCode, duration, performanceBucket
| parse kind = regex url with *"(?i)knowledgebases/"KbId"/generateAnswer"
| join kind= inner (
traces | extend id = operation_ParentId
) on id
| extend question = tostring(customDimensions['Question'])
| extend answer = tostring(customDimensions['Answer'])
| extend score = tostring(customDimensions['Score'])
| project timestamp, resultCode, duration, id, question, answer, score, performanceBucket,KbId

Related

BigQuery to Data Studio : Show reliable COUNT DISTINCT regardless of the selected period

in my BigQuery project I store event data integrated from Firebase. The granularity and dimension is such that trying to present raw data in Data Studio quickly makes the report become VERY slow (1-2 min per page/interaction).
I then started to think how I could create pre-aggregated tables in BigQuery to speed everything up, but quickly realised COUNT DISTINCT metrics would be a problem with this approach.
Let me explain:
SELECT user, date
FROM UNNEST([
STRUCT("Adam" AS user, "20190923" AS date),
("Bob", "20190923"),
("Carl", "20190923"),
("Adam", "20190924"),
("Bob", "20190924"),
("Adam", "20190925"),
("Carl", "20190925"),
("Bob", "20190926")
]) AS website_visits;
+------+----------+
| User | Date |
+------+----------+
| Adam | 20190923 |
| Bob | 20190923 |
| Carl | 20190923 |
| Adam | 20190924 |
| Bob | 20190924 |
| Adam | 20190925 |
| Carl | 20190925 |
| Bob | 20190926 |
+------+----------+
The above is a table of website visits.
Clearly, creating a pre-aggregated table like
SELECT date, COUNT(DISTINCT user) FROM website_visits GROUP BY date
has the limitation that the count cannot be aggregated further (or even less, dinamically) to get a total, as doing a SUM would return 8 unique users which is not correct, there are only 3 unique users.
In BigQuery, this is fixed by using HLL_COUNT, which despite the approximation works ok for me.
Now to the big question:
How to do the same so that the result is displayable in Data Studio????
HLL_COUNT.EXTRACT is not available as function in there, and in the reporting I always have to keep in mind that the date range is set by the user however (s)he likes so it's not possible to store a pre-aggregated result for ALL cases...
EDIT 1: APPROX_COUNT_DISTINCT
As per answer from Bobbylank, I tried to use APPROX_COUNT_DISTINCT.
However I found that this just seems to move the issue down the line. My fault for not explaining what's over there.
Despite being performances acceptable it does not seem possible to me to blend a data source with this calculated metric.
Example: After displaying the amount of unique users in the selected period (which now works), I'm also trying to display Average Revenue Per User (ARPU) in Data Studio like Firebase does.
To do this, I have to SUM(REVENUE) / APPROX_COUNT_DISTINCT(USER)
Clearly, REVENUE works ok with pre-aggregation and is available in the raw data. I tried then to blend the raw data with a table containing just user visits. However APPROX_COUNT_DISTINCT can't be used in the blended data definition as calculated metrics are not allowed.
Even trying to use the USER field as a metric with Count Distinct aggregation, despite returning the correct figures when showing revenue and user count separately, when I try to divide them the problem becomes aggregation (apply SUM or AVG to the field and basically the result will be AVG(REVENUE/USERS) for each day).
I also then tried to store REVENUE directly in the visits table, but was reminded by Data Studio that I can't create calculated metrics that I can't mix dimensions and metrics in a calculated field.
APPROX_COUNT_DISTINCT might be more performance friendly for you?
https://support.google.com/datastudio/answer/9189108?hl=en
Otherwise the only way I can think would be to pre-calculate several metrics (e.g. unique users on that day, 7-day cumulative, 14-day, etc.) as your customer require for each single day.
Or you could provide a 2 page report with both of these methods with the caveat that the first can be used over a time period but will be much slower?

What is the terminology for the report file that generates reports?

I am a bit confused of how to present to users the two different concepts; the file that generates the report given data and the final report.
We usually use the same term as Report and depending on the context we can understand if this is the report generation file or the final report. This normally is not a problem because the report generation file is created by developers and the end user only see the final report.
In my application I have to describe both of these concepts to my end users and I am pretty confused because I cannot find a proper terminology for this.
"Template" is the term I would use, it's been around for a very long time (I remember using the term template for mail merge operations before the PC was even a thing).
Basically, the template shows how the data is used to create the final output, in your case:
+----------+ +------+
| Report | | Data |
| template | +------+
+----------+ |
| | +--------+
+--------------+----> | Report |
+--------+

Optimize complex scenario in Cucumber

I have been working on an automation project where I have to write cucumber test for search filter. Search filter works dynamically where parameters are nested - next parameter are populated based on previous parameter e.g. On selecting "Subscribers" next parameters in dropdown are "Name", "City", "Network". Likewise, on selecting "Service Desk", parameters in subsequent dropdown are "Status", "Ticket no.", "Assignee". I am using Scenario Outline as below:
Scenario Outline: As a user, I can search records
Given I am on search page
When I search on "<category>" and "<nestedfilter>"
Then I see records having "<category>" category
Examples:
|category |nestedfilter|
|Subscribers |Name |
|Subscribers |City |
|Subscribers |Network |
|Service Desk|Status |
|Service Desk|Ticket no. |
|Service Desk|Assignee |
The filter could be more complex as there could be more nested filters based on previous nested filters.
All I need to know if there could be a more efficient way to handle this problem? For example passing data table to step_definition for which I am not too sure.
Thanks
If you really need the order of your items to be preserved, use a data table instead of a scenario outline.
A scenario outline is a shorthand notation for multiple scenarios. The execution of each scenario is not guaranteed. Or at least it would be a mistake to assume a specific execution order. The order of the items in a data table will not change if you use a List as argument and therefore a lot safer in your case.
A common mistake with Cucumber is to use Scenario Outline and example tables to do some sort of semi-exhaustive testing. This tends to hide lots of interesting things about the functionality being developed.
I would start writing single features for the searches you are working with and explore what those searches are and why they are important. So if we start with your first one we get ...
Note: all of the following assumes a background step Given I am searching
When I search on subscribers and name
Then I should see records for subscribers
and with the second one
When I search on subscribers and city
Then I should see records for subscribers
Now it becomes clear that there is a serious flaw in these scenarios, as both scenarios are looking for the same result.
So what you are actually testing is that
The subscribers search has name and city filters
A subscriber search should return subscriber results
Now you can refactor and get
When I do a subscriber search
Then I should see city, name, network filters
When I do a subscriber search
Then I should only see subscriber results
note: This is already much more efficient as you have reduced the number of scenarios from 3 to 2, and reduced the number of searches you have to do from 3 to 1.
Now I have no idea if this is what you want to do, but this is what your current scenario is doing. However because you are using an Outline and Example tables you can't see this.
The fact that you have a drop-down and nested filters is an implementation detail, which describes how the user is trying to achieve what they want to achieve.
If you think of what you're trying to do as examples of how the system behaves, rather than tests, it might be easier. You're not looking for something exhaustive. You also want your scenarios to be specific, so that you're illustrating them with realistic data and concrete examples. If you would commonly have some typical data available, that's a perfect thing to set up using Background.
So for instance, I might have scenarios like:
Background:
Given I have subscribers
| Name | City | Network | Status | etc.
| Bob | Rome | ABC | Alive | ...
| Sam | Berlin | ABC | Dead | ...
| Sue | Berlin | DEF | Dead | ...
| Ann | Berlin | DEF | Alive | ...
| Jon | London | DEF | Dead | ...
Scenario: First level search
Given I'm on the search page
When I search for Subscribers who are in Rome
Then I should see Bob
But not Sue or Jon.
Scenario: Second level search
Given I'm on the search page
When I search for Subscribers in Berlin on the ABC network
Then I should see Sam
But not Sue or Ann
etc.
The full-system scenarios should be just enough to understand what's going on. Don't use BDD for regression. It can help with that, but scenarios will rapidly become slow and unmaintainable if you try to cover every case. Delegate to integration and unit tests where appropriate (see "the testing pyramid").

Using application insights REST API for reading custom events

We have a custom event put in place on page which tracks the link clicks on given page to app insights. And with the REST API we would like to get the frequently accessed links from app insights.
How can we build the Query to get this analytics data, any sample on reading custom events available?
Thanks
if you open the Application Insights Analytics website for any resource, there's some "Common Queries" examples right on the front page. one of them is called "Usage" and if you click it it will show you this one:
//What are the top 10 custom events of your application in the past 24 hours?
customEvents
| where timestamp >= ago(24h)
| summarize dcount(user_Id), count() by name
| top 10 by count_
| render barchart
which:
queries customEvents,
filtering to the last 24 hours (timestamp >= ago(24h)),
does a summary of the distinct count of users (dcount(user_Id)) and the total number of events (count()), grouped by the event name (by name),
then filters to the top 10 by the _count field created from the summarization (top 10 by count_)
and then renders it as a bar chart (render barchart)
there are many other examples on the analytics home page as well.
Edit to add: You can easily query any custom properties or metrics that you send as well. the customDimensions and customMeasurements fields in each event type are json typed fields, and if there's no spaces in the names, you can just use dot notation to grab values. if the field has names/special characters, use brackets and quotes:
customEvents
| where timestamp >= ago(1h)
| extend a = customDimensions.NameOfFieldWithNoSpacesOrSpecialCharacters
| extend b = customDimensions["Field with spaces"]
| extend duration = customMeasurements["Duration (ms)"]
| project a, b, duration
| limit 10
(you don't need to use extend, you can use the fields however you want this way, with extend or project or summarize or any other functions or anything else. i just used extend for the example here.)

Concatenate Google Analytics results to ignore country code in URL

Our website automatically detects a user's region. Though the site structure remains the same across all regions, the content on the page can vary.
As such, URLs are fomatted as so: http://website.com/XX/pagename with XX=country code (e.g. GB, US, IT, etc.)
On Google Analytics, I want to see all of the different country versions of a single page contained as a single result.
For example, if I look at our top pages for January, I see:
| URL | page views |
|-------------------------|------------|
| website.com/US/page1 | 100 |
| website.com/GB/homepage | 60 |
| website.com/US/homepage | 40 |
| website.com/GB/page1 | 20 |
But what I want to see is:
| URL | page views |
|----------------------|------------|
| website.com/page1 | 120 |
| website.com/homepage | 100 |
Wherein the same URL (ignoring country code) is concatenated into one figure.
Is such a thing possible?
My end game here is a desire to see what our most popular pages are across the site in total, regardless of which country the user is browsing from.
Thanks!
One option is to use an advanced filter in GA so that you take something like website.com/US/page1 and replace it with website.com/page1. This only works on data moving forward from when the filter is applied, and does not change historical data, and cannot be undone once applied. This is another reason why it's always a good idea to have a Raw view which is unfiltered.
For the Advanced Filter, you need to do something like this:
where it looks for the pattern /{any two letters}/{anything else} and outputs just the /{anything else} part.

Resources